curl-library
Re: [PATCH] always multi v5
Date: Fri, 11 Jan 2013 18:45:05 +0100
On Fri, Jan 11, 2013, Daniel Stenberg wrote:
> Can you first check and see if there's a time descripancy between the
> trace64 log and the http_server.log ? I can't make sense of the times in
> there:
These are the best kind of timestamps we can get on this kind of
'system', not too bad accuracy and not too bad resolution. But
certainly not good enough in order to timestamp with 'real'
microsecond resolution and accuracy each call, so trying to mentally
sync two logs with timestamps is just an approximation.
> http server:
>
> 00:25:55.470998 ====> Client connect
> 00:25:55.470998 TCP_NODELAY set
> 00:25:55.470998 accept_connection 3 returned 4
> 00:25:55.470998 accept_connection 3 returned 0
> 00:25:55.490998 Read 84 bytes
> 00:25:55.490998 Process 84 bytes request
> 00:25:55.490998 Got request: GET /64 HTTP/1.1
>
> trace:
>
> 00:25:55.461476 == Info: About to connect() to 127.0.0.1 port 8990 (#0)
> 00:25:55.471476 == Info: Trying 127.0.0.1...
> 00:25:55.491476 == Info: connected
> 00:25:55.491476 == Info: Connected to 127.0.0.1 (127.0.0.1) port 8990 (#0)
> 00:25:55.491476 == Info: 0xb4f1a0 is at send pipe head!
> 00:25:55.491476 == Info: STATE: CONNECT => DO handle 0xb72f68; line 1025
> (connection #0)
> 00:25:55.491476 == Info: Server auth using Digest with user 'testuser'
> 00:25:55.491476 => Send header, 84 bytes (0x54)
>
> See? The server claims it got the GET already before libcurl logged
> 'connected' !
Yeah, and server claims it read the 84 bytes 0,000478 seconds before
curl logs that the 84 bytes have been sent. With this we know that
there's at least that offset between timestamps in both logs, although
other events might be apart even with double that offset.
Fun, Eh? ;-)
> And then for the next sequence of interest to me:
>
> server log:
>
> 00:25:55.510998 Response sent (245 bytes) and written to log/server.response
> 00:25:55.510998 instructed to close connection after server-reply
> 00:25:55.510998 ====> Client disconnect 0
>
> trace log:
>
> 00:25:55.511476 == Info: Connection #0 to host 127.0.0.1 left intact
> 00:25:55.511476 == Info: Issue another request to this URL:
> 'http://127.0.0.1:8990/64'
> 00:25:55.511476 == Info: Found bundle for host 127.0.0.1: 0xb7c180
> 00:25:55.511476 == Info: Re-using existing connection! (#0) with host
> 127.0.0.1
> 00:25:55.511476 == Info: Connected to 127.0.0.1 (127.0.0.1) port 8990 (#0)
>
> ... which looks like SocketIsDead() didn't think the connection it tries to
> re-use is dead, but the time stamps indicate it should've seen it.
(server log 00:25:55.510998) + 0.000478 = (trace.log 00:25:55.511476)
But lack of a better resolution, makes it impossible to truly tell
apart timestamps for:
> 00:25:55.511476 == Info: Found bundle for host 127.0.0.1: 0xb7c180
> 00:25:55.511476 == Info: Re-using existing connection! (#0) with host
I've been tracing SocketIsDead() and certainly when called in this
case it does not detect that the connection has been closed on the
other side, with a select timeout of zero. :-(((((
Given that all this is running on cygwin with its own implementation
of select on top of pthreads which calls underlying windows select
with whatever implementation it has on top of message passing and
windows 'threading model' on a non-realtime system, I consider
absolutely normal that in such conditions there exist race conditions
which makes it impossible to detect closing connection instantly.
As a matter of fact if select call in SocketIsDead() is given a
timeout of 60ms then test #64 succeeds, given that in that case it
detects that the connection has been closed.
I'll do more tests in order to find a better workaround that does not
involve using a timeout.
> Then there's clearly a bug when things went south due to the dead connection
> being attempted for sending the request on:
>
> 00:25:55.571476 == Info: Recv failure: Connection reset by peer
> 00:25:55.571476 == Info: Connection died, retrying a fresh connect
> 00:25:55.571476 == Info: Recv failure: Connection reset by peer
> 00:25:55.571476 == Info: Connection died, retrying a fresh connect
>
> ... it doesn't look like this is truly trying a fresh connect, it seems to
> mostly just _say_ it!
Yep, something could probably be improved. As I've mentioned,
increassing select timeout in SocketIsDead() to 60ms allows test #64
to pass, but on the other hand the timeout has to be increased to more
than 1000ms in order to make test #160 pass which makes the server
wait 1 second before closing the connection.
Even with all the bizarreness of this cygwin on windows testing
framework, multi should be capable of properly handling remote
disconnection at any point where network activity is taking place.
I'll keep digging on Monday.
-- -=[Yang]=- ------------------------------------------------------------------- List admin: http://cool.haxx.se/list/listinfo/curl-library Etiquette: http://curl.haxx.se/mail/etiquette.htmlReceived on 2013-01-11