cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Serious sporadic bug in libcurl 7.10

From: Ralph Mitchell <ralphmitchell_at_gmail.com>
Date: Fri, 10 Jun 2005 02:19:28 -0500

Have you tried dropping a "sleep (5);" in at the end of the main loop?
 I'm wondering if maybe when curl *thinks* it has finished sending,
there could still be data buffered somewhere in the TCP stack. You're
running back around the main loop and starting the next send real
quick.

I seem to remember reading somewhere that some kinds of network
packets can be "expedited", thereby passing others destined for the
same host. I think it was mostly a problem where the sender could
tell the receiver "all done, drop the connection", and that message
would overtake the last few data packets. The receiver would then
miss those packets.

Obviously that's not the problem here, but maybe the startup (SYN)
packet is expedited the same way, so the PUT headers overtake the last
data?? That's a wild, completely unsupported guess... :)

Also, try a packet sniffer like ethereal. It would be interesting to
see if the future PUT headers are in a packet of their own, or
actually embedded in a curl data packet. If they're in the *middle*
of a curl data block, then curl might be at fault. If they're at the
end of a curl block or in a completely independant packet, it might be
the TCP stack getting creative...

Disclaimer: I only know enough about network internals to be dangerous
- what I just said could be completely off the wall, but it might give
you someplace to start looking... :)

Ralph Mitchell

On 6/9/05, Gary Lawrence Murphy <garym_at_teledyn.com> wrote:
>
> I'm experiencing serious, strange and sporadic data corruption using
> PUT requests from curl-devel-7.10.4-1 and curl-devel-7.10.6-6.rhel3
>
> The application is dead-simple: a small textbook C libcurl program
> (source attached) reads filenames from stdin, and PUTs each file to
> the URL given on the command line. This is run from another dispatch
> program that creates a unix fifo pipe and spawns the C program when
> there are new files for the subscriber; the C program lingers for
> 5 minutes in case there are more, then exits.
>
> When bandwidth is not an issue, it works perfectly, never any
> transmission errors.
>
> Over DSL-grade connections we see HTTP headers /inserted/ into the
> body of the transmitted file:
>
> <team>
> <team-metadata xmlns:fs="java.io.File" team-key="l.mlb.com-t.7" alignment="away">
> <name first="Cleveland" last="Indians" />
> </team-metadata>
> <team-stats score="0" event-outcome="undecided" />
> </team>
> <team>
> <team-metadata xPUT / HTTP/1.1
> Host: 99.99.99.99:8083
> Pragma: no-cache
> Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
> Content-Length: 30434
>
> mlns:fs="java.io.File" team-key="l.mlb.com-t.5" alignment="home">
> <name first="Toronto" last="Blue Jays" />
> </team-metadata>
> <team-stats score="0" event-outcome="undecided" />
> </team>
>
> And here's what is really most disturbing: Those are not the headers
> for the /current/ PUT transaction, they are the headers for the _next_
> transaction! Inserted into the midst of the current document stream.
>
> How could this happen? Data corrpution from the FUTURE?? Cool.
>
> The C program isn't multi-threaded, there's only one instance per
> target host, and it's totally /linear/: read a filename, PUT that
> file, repeat.
>
> Here's a debug trace of the above transaction:
>
> 2005-06-09 18:08:08 DEBUG: PUT baseball/l.mlb.com/schedules/xt.2951321-cle-sked.xml
> 2005-06-09 18:08:08 DEBUG: Sending 169537 bytes
> 2005-06-09 18:08:08 DEBUG: 65536 bytes read
> * Re-using existing connection! (#0)
> * Connected to target-host.net (99.99.99.99) port 8083
> > PUT / HTTP/1.1^M
> Host: 99.99.99.99:8083^M
> Pragma: no-cache^M
> Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*^M
> Content-Length: 169537^M
> ^M
> 2005-06-09 18:08:08 DEBUG: 16384 bytes sent
> 2005-06-09 18:08:08 DEBUG: 16384 bytes sent
> 2005-06-09 18:08:08 DEBUG: 16384 bytes sent
> 2005-06-09 18:08:08 DEBUG: 16384 bytes sent
> 2005-06-09 18:08:08 DEBUG: 65536 more bytes read
> 2005-06-09 18:08:08 DEBUG: 16384 bytes sent
> 2005-06-09 18:08:09 DEBUG: 16384 bytes sent
> 2005-06-09 18:08:09 DEBUG: 16384 bytes sent
> 2005-06-09 18:08:09 DEBUG: 16384 bytes sent
> 2005-06-09 18:08:09 DEBUG: 38465 more bytes read
> 2005-06-09 18:08:09 DEBUG: 16384 bytes sent
> * Operation timed out with 0 out of -1 bytes received
> * Connection #0 left intact
> 2005-06-09 18:08:09 SENT: baseball/l.mlb.com/schedules/xt.2951321-cle-sked.xml
>
> The Ruby WebBrick records nothing special, just the expected content
> length that just happens to have the future header jazz injected into
> the stream.
>
> The headers for the immediately following message, as recorded by the
> libcurl application:
>
> * Re-using existing connection! (#0)
> * Connected to target-host.net (99.99.99.99) port 8083
> > PUT / HTTP/1.1^M
> Host: 99.99.99.99:8083^M
> Pragma: no-cache^M
> Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*^M
> Content-Length: 30434^M
> ^M
> 2005-06-09 18:08:23 DEBUG: 16384 bytes sent
> 2005-06-09 18:08:23 DEBUG: 14050 bytes sent
> 2005-06-09 18:08:23 DEBUG: transfer complete
> * Connection #0 left intact
>
> those are the headers allright.
>
>
> We have verified the bug using both a Perl HTTP::Daemon and a Ruby
> WebBrick test program. On a windows receiver, the error is frequent,
> about 20% of the documents are corrupted and there is no pattern to
> the particular doc type, it's size or the time of day. On a Linux
> receiver on 500Mb/s DSL the error rate is much less, one in 500,
> but it is still not zero.
>
> It's also not happening on a clean 16k boundary; the above error
> happened 141744 bytes into the XML file.
>
> Some notes on the attached C file (I hope the list allows attachments)
>
> 1) I moved the following code from outside the read/PUT loop to be
> inline each cycle just before we send the PUT, just in case these
> functions have side effects that clear their buffers. I don't know
> I'm grasping at straws ;)
>
> curl_easy_setopt(curl,CURLOPT_URL, target_url);
> curl_easy_setopt(curl, CURLOPT_UPLOAD, TRUE) ;
>
> 2) The docs say that easy_init and easy_cleanup are to be done at
> the beginning and end of a "session" -- I take that to mean a
> flurry of PUTs is a "session"
>
> is this too generous? Should I be doing the init/cleanup for
> each individual PUT? Shorten my wait-time because the remote
> receiver may be timing out before I do and confusing my next
> doc (but then, why do the headers get injected and not simply
> switch from Doc-A to Doc-B?)
>
> the general pattern of PUTs tends to be clumps of update XML docs,
> as many as several dozen within a few seconds, then a pause of up
> to a few minutes, or during off-hours, pauses of hours. The
> program keeps alive for 5 minutes awaiting new content orders, then
> uses SIGALRM to close the session and exit.
>
> 3) There is a suspicious 1-second timeout -- do not be alarmed;
> another bug we were wrestling with saw the remote Windows clients
> pausing up to 60 seconds after receiving each PUT request so to
> "just make it work" the chief engineer ordered a "hurl and forget"
> policy; this bug with the injected HTTP headers was happening even
> when the timeout was 60 seconds (but maybe that is still a
> significant clue?)
>
> This bug has us stymied, it makes no sense. While sending a document
> through a PUT request, we see log messages for each 16k block filled
> by the read callback, and we see the transaction close either by an
> acknowledge ment or because of the timeout, and we see all that happen
> _before_ the log shows the handshake for the subsequent transactions,
>
> So how can HTTP headers from the _next_ PUT get into the middle of one
> of those 16k blocks?
>
> Any hints, tips, theories or wild guesses are eagerly welcome.
>
>
>
>
> --
> Gary Lawrence Murphy <garym_at_teledyn.com> ==============================
> ====================== The present moment is a powerful goddess (Goethe)
>
>
>
>
Received on 2005-06-10