Re: Problem with curl provided example "fopen.c" involving multi interface
Date: Thu, 19 May 2011 15:45:53 -0400
On 05/19/2011 01:22 AM, Dan Fandrich wrote:
> On Wed, May 18, 2011 at 06:41:40PM -0400, Robert Banfield wrote:
>> fopen.c uses the multi interface to read data from the stream using a
>> buffer that automatically increases in size via realloc. As
>> url_fread is called, two things happen: (1) curl_multi_perform is
>> called, and (2) once enough bytes have been read from the stream,
>> data is copied out of the buffer and the remaining data in the buffer
>> is shifted down to the beginning of the buffer. Artificially, using
>> just one curl handle, if you tried to read just 1 byte of the stream
>> of a very large transfer, curl_multi_perform would be called so
>> frequently relative to the amount of data actually being read off the
>> buffer, that the buffer would grow until memory was exhausted.
>> A more interesting problem I've discovered uses two curl handles and
>> alternating reads. Early on, and regardless of which handle is used
>> when calling url_fread(), when curl_multi_perform is called (after
>> waiting on select()), sometimes it transfers data to the first curl
>> handle's buffer, sometimes it transfers data to the second curl
>> handle's buffer. After a couple megs of each buffer getting a some
>> data, curl_multi_perform suddenly only transfers data to the first
>> handle's buffer, ignoring the second until the first is finished.
>> This is a problem when url_fread() is called on the second curl
>> handle... it has to wait until the first is done transferring, and
>> the first must be transferred in its entirety to dynamically resized
>> and ever-expanding memory until it is finished or is killed/crashes.
>> The real world problem I'm trying to solve is using FUSE to read from
>> an http stream. Basically a couple files are getting opened and read
>> through, but sometimes one of those files gets no data because the
>> multi interface is busy adding data to the file not being accessed,
>> that is, until the kernel kills the process after it runs out of
>> Any ideas on a solutions?
> A possible solution would be to set a buffer limit (a couple of megabytes,
> perhaps) and check the size of the realloced buffer before it's enlarged
> to ensure that it doesn't exceed the limit. If it does, then call
> curl_easy_pause() on the handle to pause further transfers on it (letting
> TCP flow control take effect). In the url_fread() function, unpause
> the handle once the buffer size is reduced below that limit (or maybe
> half the limit, to introduce some hysteresis). That will eliminate the
> first problem by capping the buffer size, and the second problem by
> preventing one connection from monopolizing the transfer indefinitely
Thank you for the insight and the pointer to curl_easy_pause() with the
magic read/write callback function return values.
I found another problem in that the still_running variable in the
URL_FILE struct tracks whether *any* transfer is in progress, not just
that particular URL_FILE. This is a problem whenever two simultaneous
transfers do not complete at the same time. Any data left to be
transferred in the remaining handle would be forced into main memory
until the transfer ended.
After using curl_easy_pause(), instead of blowing up main memory, it
would deadlock. I found the easiest fix to be checking the status of
the maxfd variable after curl_multi_fdset. If maxfd==-1 then all other
transfers have either reached their max buffer size and paused, or
finished their transfer. Not the most efficient way of determining an
end-of-transfer, but the only one I can think of that doesn't add undue
complexity to the example.
It also seemed to me the tests when running this example were not quite
correct. So I changed them slightly.
Attached is my patch to fopen.c.
Please let me know if there's anything more I can do.
- text/plain attachment: fopen.c.patch