cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Multi cURL connect bug

From: Evgeny Turnaev <turnaev.e_at_gmail.com>
Date: Sat, 6 Jul 2013 01:50:02 +0400

2013/7/6 Keyur Govande <keyurgovande_at_gmail.com>:
> On Fri, Jul 5, 2013 at 11:10 AM, Evgeny Turnaev <turnaev.e_at_gmail.com> wrote:
>> 2013/7/5 Evgeny Turnaev <turnaev.e_at_gmail.com>:
>>> 2013/7/5 Keyur Govande <keyurgovande_at_gmail.com>:
>>>> On Fri, Jul 5, 2013 at 5:26 AM, Evgeny Turnaev <turnaev.e_at_gmail.com> wrote:
>>>>> Hi Keyur,
>>>>> First of i am not really expert in curl. But reading through
>>>>> documentation i can't find evidence that curl_multi_perform should
>>>>> return CURLM_CALL_MULTI_PERFORM untill connection succeeds. Connection
>>>>> can take very long time. Why don't you change your application code so
>>>>> that it selects (poll or whatever) on curl sockets and when there is
>>>>> action - call curl_multi_perform again. I suppose this is a typical
>>>>> usage. Someone please correct me if i am wrong.
>>>>>
>>>>
>>>> Hi Evgeny,
>>>>
>>>> Thanks for the response.
>>>>
>>>> The issue is the inconsistency of the response code.
>>>>
>>>> In the localhost case, when curl_multi_perform returns CURLM_OK, the
>>>> TCP connection is done, the HTTP connection is done, and the request
>>>> has been sent over. If I call a select() on the FDs returned by
>>>> curl_multi_fdset(), I'm actually waiting for a response for the
>>>> server. I don't want this because I want to execute other code while
>>>> the async request is being procesed.
>>>>
>>>> Whereas in the slow-remote-host case, even the TCP connection is not
>>>> done when CURLM_OK is returned by curl_multi_perform(). So when I call
>>>> a select(), I am still waiting for the TCP connect to go through.
>>>>
>>>> There is no way for my code to know which of the two scenarios is it
>>>> currently experiencing.
>>>
>>> Hmm. i suppose you should not want to know this. There is nothing
>>> practical you can do knowing this. I thought this is a point of curl
>>> library to hide this kind of stuff from you.
>>>
>>>>
>>>> Once curl_multi_perform() has a TCP connection it returns
>>>> CURML_CALL_MULTI_PERFORM until the request is sent over. It seems like
>>>> a miss to not return the same when the TCP connection has not yet gone
>>>> through.
>>
>> One more thing: when connect() call succeeds right away and there is
>> tcp connection established - curl_multi_perform() returns
>> CURML_CALL_MULTI_PERFORM this is just like curl saying to you - hey
>> call curl_multi_perform() again i think i got to work and this is
>> because when connection established - curl can write request to it.
>> But when there is no connection curl can't do anything practical -
>> there is no point in calling curl_multi_perform() again. It is better
>> to select() (poll) in this case. Looping calls of curl_multi_perform()
>> untill connection established will throttle cpu, this is like active
>> polling.
>> At least this is my iterpretation of documentation.
>>
>
> Looking at the code it seems like if for example, the protocol connect
> for HTTPS didn't finish quickly enough, curl_multi_perform() will
> return CURLM_CALL_MULTI_PERFORM. I'm proposing that this edge-case is
> the same as not finishing a TCP connect() and both should be handled
> similarly.
>
> My goal is to be able to make an asynchronous RPC with curl. If there
> are other ways to accomplish this, please do let me know. From my
> point of view, the library is 99.9% of the way there in supporting
> this behavior, except for this one corner case around the TCP
> connection.
>
>>>
>>> Could you please quote documentation or point me to documentation
>>> saying that curl_multi_perform() should not return CURLM_OK untill
>>> connection is actually established?
>>>
>
> I can't, hence this email to the list asking for opinions on the
> matter and the proposed patch :-)

I am just curous - what cpu usage of fetching process will you
experience using libcurl with your patch while trying to fetch for
example 'http://google.com:12345/'
(or any other host that will not respond with icmp and connect()
should timeout after a while)

>
>>>>
>>>> Hope that makes sense.
>>>>
>>>> Thanks,
>>>> Keyur.
>>>>
>>>>
>>>>> 2013/7/5 Keyur Govande <keyurgovande_at_gmail.com>:
>>>>>> Hello,
>>>>>>
>>>>>> We're using cURL to do some asynchronous programming in PHP.
>>>>>>
>>>>>> The code fires off a multi-curl request to get some data from a remote
>>>>>> server. Once the request is sent i.e. curl_multi_exec() (PHP's wrapper
>>>>>> around curl_multi_perform() ) returns CURLM_OK, the code continues its
>>>>>> processing and once it is all done and ready, it checks the multi-curl
>>>>>> request for a response.
>>>>>>
>>>>>> We're seeing some unusual behavior that might be a bug. We're
>>>>>> currently on 7.19.7-26.el6_1.2, but I checked the code and the same
>>>>>> behavior is in the latest version as well.
>>>>>>
>>>>>> If the curl request tries to connect to a host that is slow to respond
>>>>>> to the initial connect, then curl_multi_perform() returns CURLM_OK,
>>>>>> even though the connection is not completed and the request not sent.
>>>>>>
>>>>>> Here's a couple of sample straces showing both conditions. The PHP code is:
>>>>>>
>>>>>> // Set up the multi-curl
>>>>>> do {
>>>>>> $cme = curl_multi_exec($this->mc_handle, $this->mc_running);
>>>>>> } while ($cme === CURLM_CALL_MULTI_PERFORM);
>>>>>> // For purposes of the test, sleep. This is where other code would execute
>>>>>> sleep(2);
>>>>>>
>>>>>> Connection to localhost where connect() is super fast:
>>>>>> 1372871818.596930 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 61 <0.000024>
>>>>>> 1372871818.596990 fcntl(61, F_GETFL) = 0x2 (flags O_RDWR) <0.000023>
>>>>>> 1372871818.597044 fcntl(61, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000015>
>>>>>> 1372871818.597086 connect(61, {sa_family=AF_INET, sin_port=htons(80),
>>>>>> sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now
>>>>>> in progress) <0.000186>
>>>>>> 1372871818.597315 poll([{fd=61, events=POLLOUT|POLLWRNORM}], 1, 0) = 1
>>>>>> ([{fd=61, revents=POLLOUT|POLLWRNORM}]) <0.000017>
>>>>>> 1372871818.597370 getsockopt(61, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 <0.000016>
>>>>>> 1372871818.597902 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 62 <0.000020>
>>>>>> 1372871818.597960 fcntl(62, F_GETFL) = 0x2 (flags O_RDWR) <0.000016>
>>>>>> 1372871818.598005 fcntl(62, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000018>
>>>>>> 1372871818.598060 connect(62, {sa_family=AF_INET, sin_port=htons(80),
>>>>>> sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now
>>>>>> in progress) <0.000057>
>>>>>> 1372871818.598162 poll([{fd=62, events=POLLOUT|POLLWRNORM}], 1, 0) = 1
>>>>>> ([{fd=62, revents=POLLOUT|POLLWRNORM}]) <0.000016>
>>>>>> 1372871818.598212 getsockopt(62, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 <0.000015>
>>>>>> 1372871818.598665 sendto(61, "GET /rpc_test.php?sleep=1
>>>>>> HTTP/1.1\r\nAccept: */*\r\nHost: www.example.com\r\n\r\n", 75,
>>>>>> MSG_NOSIGNAL, NULL, 0) = 75 <0.000159>
>>>>>> 1372871818.599054 sendto(62, "GET /rpc_test.php?sleep=1
>>>>>> HTTP/1.1\r\nAccept: */*\r\nHost: www.example.com\r\n\r\n", 75,
>>>>>> MSG_NOSIGNAL, NULL, 0) = 75 <0.000072>
>>>>>> 1372871818.601071 nanosleep({2, 0}, 0x7fff7ce2f4c0) = 0 <2.000130>
>>>>>>
>>>>>> Connecting to a slow remote host:
>>>>>> 1372871825.806179 connect(61, {sa_family=AF_INET, sin_port=htons(80),
>>>>>> sin_addr=inet_addr("10.255.58.37")}, 16) = -1 EINPROGRESS (Operation
>>>>>> now in progress) <0.000195>
>>>>>> 1372871825.806417 poll([{fd=61, events=POLLOUT|POLLWRNORM}], 1, 0) = 0
>>>>>> (Timeout) <0.000017>
>>>>>> 1372871825.806919 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 62 <0.000024>
>>>>>> 1372871825.806976 fcntl(62, F_GETFL) = 0x2 (flags O_RDWR) <0.000016>
>>>>>> 1372871825.807020 fcntl(62, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000016>
>>>>>> 1372871825.807064 connect(62, {sa_family=AF_INET, sin_port=htons(80),
>>>>>> sin_addr=inet_addr("10.255.58.37")}, 16) = -1 EINPROGRESS (Operation
>>>>>> now in progress) <0.000173>
>>>>>> 1372871825.807278 poll([{fd=62, events=POLLOUT|POLLWRNORM}], 1, 0) = 0
>>>>>> (Timeout) <0.000021>
>>>>>> 1372871825.807688 poll([{fd=61, events=POLLOUT|POLLWRNORM}], 1, 0) = 0
>>>>>> (Timeout) <0.000016>
>>>>>> 1372871825.807824 poll([{fd=62, events=POLLOUT|POLLWRNORM}], 1, 0) = 0
>>>>>> (Timeout) <0.000016>
>>>>>> 1372871825.809776 nanosleep({2, 0}, 0x7fff04b17530) = 0 <2.000174>
>>>>>>
>>>>>> For the slow remote host, the sleep is getting triggered even though
>>>>>> the GET request was not yet sent.
>>>>>>
>>>>>> Reading through the code in lib/multi.c, in the
>>>>>> CURLM_STATE_WAITCONNECT block, if connected is false, then the result
>>>>>> should be CURLM_CALL_MULTI_PERFORM in order to keep calling poll until
>>>>>> the connections are established and the requests flushed out.
>>>>>>
>>>>>> Here's the diff on the 7.19.7 branch that I think will fix the issue.
>>>>>> The patch for 7.31.0 would be exactly the same:
>>>>>> diff --git a/lib/multi.c b/lib/multi.c
>>>>>> index 48df928..dc5ec48 100644
>>>>>> --- a/lib/multi.c
>>>>>> +++ b/lib/multi.c
>>>>>> @@ -1077,6 +1077,7 @@ static CURLMcode multi_runsingle(struct Curl_multi *multi,
>>>>>> break;
>>>>>> }
>>>>>>
>>>>>> + result = CURLM_CALL_MULTI_PERFORM;
>>>>>> if(connected) {
>>>>>> if(!protocol_connect) {
>>>>>> /* We have a TCP connection, but 'protocol_connect' may be false
>>>>>> @@ -1095,8 +1096,6 @@ static CURLMcode multi_runsingle(struct Curl_multi *multi,
>>>>>> /* after the connect has completed, go WAITDO or DO */
>>>>>> multistate(easy, multi->pipelining_enabled?
>>>>>> CURLM_STATE_WAITDO:CURLM_STATE_DO);
>>>>>> -
>>>>>> - result = CURLM_CALL_MULTI_PERFORM;
>>>>>> }
>>>>>> }
>>>>>> break;
>>>>>>
>>>>>> Is there anything I'm missing, or another way to accomplish this?
>>>>>>
>>>>>> Thanks,
>>>>>> Keyur.
>>>>>> -------------------------------------------------------------------
>>>>>> List admin: http://cool.haxx.se/list/listinfo/curl-library
>>>>>> Etiquette: http://curl.haxx.se/mail/etiquette.html
>>>>> -------------------------------------------------------------------
>>>>> List admin: http://cool.haxx.se/list/listinfo/curl-library
>>>>> Etiquette: http://curl.haxx.se/mail/etiquette.html
>>>> -------------------------------------------------------------------
>>>> List admin: http://cool.haxx.se/list/listinfo/curl-library
>>>> Etiquette: http://curl.haxx.se/mail/etiquette.html
>> -------------------------------------------------------------------
>> List admin: http://cool.haxx.se/list/listinfo/curl-library
>> Etiquette: http://curl.haxx.se/mail/etiquette.html
> -------------------------------------------------------------------
> List admin: http://cool.haxx.se/list/listinfo/curl-library
> Etiquette: http://curl.haxx.se/mail/etiquette.html
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2013-07-05