cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Avoid copying data in CURLOPT_WRITEFUNCTION callback

From: Rich Gray <rgray_at_plustechnologies.com>
Date: Wed, 15 Feb 2012 10:08:11 -0500

Konstantin Miller wrote:
> Hi!
>
> Is there any way to avoid copying data between the buffer that is passed to
> the callback function, which is set with CURLOPT_WRITEFUNCTION, and my own
> buffer? Can I tell libcurl that I would like to reuse its buffer and that I
> will free it myself later on? Or, even better, that I will give it back to
> libcurl once I'm done with it?

Currently, an application that wants to process libcurl data which may have
data sequences which span callbacks has two choices:
1) copy the data to someplace else to assemble it or
2) implement a state machine.

Here are two possible ways libcurl might make it easier:

1) Allow the libcurl caller to specify a callback data buffer

A function along the lines of

CURLcode curl_set_callback_buffer(CURL *handle, void *buf, size_t len)

would direct libcurl to deposit its data into the given buffer.
(Alternatively, I guess one could do this with a pair of CURLOPTs to specify
the buffer address and size.) This capability would have to be usable from
within a write callback function to allow the application to do things like
cause libcurl to place successive returns in contiguous memory. (I don't
believe I've seen anything about calling libcurl functions from within a
callback.)

The nice thing about accepting a buffer from the user is flexibility. The
space can simply be a chunk of char buffer[HUMONGOUS], it can be from a
malloc'd buffer, it can be in shared memory, it could be mmap'd disk...

2) Allow the caller to tell libcurl it has not processed all of the callback
data.

This one is not as much about efficiency as potentially making callback code
simpler. The caller might set a CURLOPT_WRITEFUNCTION_PARTIALS flag which
would allow it to return less than the number of bytes given to a callback
function. Instead of treating this as an error, libcurl would move the
unprocessed data to the top of the buffer and return it again on the next
callback, along with more data.

An example of this would be getting a 4k chunk of callback data in a program
which is scraping data, line by line, from text files. At the end of the
buffer is 40 bytes of a partial line. By returning 4k-40 from the callback,
the callback function would defer dealing with the line fragment, getting
the entire line and more on the next callback. Copying and/or a state
machine is avoided.

This one does have some edge cases. The chunk of left over data presumably
could not be "too large", to avoid leaving libcurl with too little remaining
space? At the end of the data, if there's still partial data, the callback
would have to be made again with the last fragment and the application would
have to recognize that the size is the same as the amount that it didn't
process on the previous call.

I haven't had a chance to look at libcurl internals, so perhaps there are
reasons why these suggestions would not work. I offer them anyway...

Cheers!
Rich

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2012-02-15