cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Problem with curl provided example "fopen.c" involving multi interface

From: Dan Fandrich <dan_at_coneharvesters.com>
Date: Wed, 18 May 2011 22:22:22 -0700

On Wed, May 18, 2011 at 06:41:40PM -0400, Robert Banfield wrote:
> fopen.c uses the multi interface to read data from the stream using a
> buffer that automatically increases in size via realloc. As
> url_fread is called, two things happen: (1) curl_multi_perform is
> called, and (2) once enough bytes have been read from the stream,
> data is copied out of the buffer and the remaining data in the buffer
> is shifted down to the beginning of the buffer. Artificially, using
> just one curl handle, if you tried to read just 1 byte of the stream
> of a very large transfer, curl_multi_perform would be called so
> frequently relative to the amount of data actually being read off the
> buffer, that the buffer would grow until memory was exhausted.
>
> A more interesting problem I've discovered uses two curl handles and
> alternating reads. Early on, and regardless of which handle is used
> when calling url_fread(), when curl_multi_perform is called (after
> waiting on select()), sometimes it transfers data to the first curl
> handle's buffer, sometimes it transfers data to the second curl
> handle's buffer. After a couple megs of each buffer getting a some
> data, curl_multi_perform suddenly only transfers data to the first
> handle's buffer, ignoring the second until the first is finished.
> This is a problem when url_fread() is called on the second curl
> handle... it has to wait until the first is done transferring, and
> the first must be transferred in its entirety to dynamically resized
> and ever-expanding memory until it is finished or is killed/crashes.
>
> The real world problem I'm trying to solve is using FUSE to read from
> an http stream. Basically a couple files are getting opened and read
> through, but sometimes one of those files gets no data because the
> multi interface is busy adding data to the file not being accessed,
> that is, until the kernel kills the process after it runs out of
> memory.
>
> Any ideas on a solutions?

A possible solution would be to set a buffer limit (a couple of megabytes,
perhaps) and check the size of the realloced buffer before it's enlarged
to ensure that it doesn't exceed the limit. If it does, then call
curl_easy_pause() on the handle to pause further transfers on it (letting
TCP flow control take effect). In the url_fread() function, unpause
the handle once the buffer size is reduced below that limit (or maybe
half the limit, to introduce some hysteresis). That will eliminate the
first problem by capping the buffer size, and the second problem by
preventing one connection from monopolizing the transfer indefinitely.

>>> Dan
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2011-05-19