cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Zero-copy interface

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Tue, 20 Jul 2010 22:36:12 +0200 (CEST)

On Mon, 19 Jul 2010, G Drukier wrote:

> The way I've done this in the past is to have the code acquiring the data,
> which in this case would be libcurl, assign an appropriate block of memory
> itself for each incoming piece of data and pass that on. The problem with
> that approach is knowing what allocator is being used so that the buffer can
> be subsequently deallocated properly. This problem is alleviated by your
> approach in which the user allocates and provides the buffer.

Yes, since the application knows how you want the data so it is way better if
you allocate or by any other means provide the buffer. If libcurl would do
that, you can bet that it won't be enough. An application could use mmap(),
malloc(), alloca() or other ways. libcurl could not offer such fancy ways.

> If I understand the rest of your proposal, the user would set the zero-copy
> option, and then, when libcurl hits the location where it needs to write

Well, it would basically ask the application to provide a buffer. To do the
alloc if you want.

> it call the getbuffer callback to get the buffer. When it is done reading,
> it then calls the write callback. This seemed inefficient to me, until I
> thought about it further in the context of how libcurl works.

In what way is that more inefficient than if libcurl itself does a mere alloc
? It would be a single extra function call, in most systems and surrounding
that would be a few cycles extra. I don't consider that "inefficient".

The benefit is that then the application gets the data served in exactly that
memory it wants/needs.

> I would have suggested instead, that the setopt mechanism be used to set the
> location of the buffer to write to. The buffer should be subject to the
> CURL_MAX_WRITE_SIZE minimum. Then the data gets read and the write callback
> gets called as usual.

...

> The user then does what he likes with the memory, and, if desired allocate
> new memory. The problem, as I then realized, is that the write callback
> doesn't have the handle, and so can't run setopt.

It could figure out the handle by itself that wouldn't be hard or complicated,
but curl_easy_setopt() is used to set options _before_ transfers. Not during
them. So it would A) not working with the way the current code works and more
importantly B) introduce a setopts that are set in run-time and that will add
conusion and will not follow the general API usage style of libcurl.

But really, you're suggesting basically the same as I suggested, but you
want to set the address with a function where I suggested a callback from
libcurl that gets the address back. Not a terribly big difference.

> Further, although I've only used the easy interface until now, I imagine
> that having only one buffer is going to cause a problem for multi.

Not really, multi is just several easy transfers and each easy transfer needs
its own single buffer at all times.

> So your proposal makes sense in that it minimizes the amount of modification
> needed, and leaves the memory allocation problems in the user's hands. It
> would require one or two new options. One to signify that zero-copy is to be
> used, and one to set the getbuffer callback. Alternatively, and more
> economically, the latter would set the former; the default state being a
> NULL callback, and bypass of the code.

Right, as the zero copy would require a callback we can just as well have the
callback option imply zero copy.

> A question though. If the buffer is larger than CURL_MAX_WRITE_SIZE, does
> the getbuffer callback would need to notify libcurl that this is the case?

No.

> Or is CURL_MAX_WRITE_SIZE a hard(ish) limit, and libcurl shouldn't be called
> upon to do more than this?

CURL_MAX_WRITE_SIZE is a compile-time limit that limits how much data libcurl
will ever store in the buffer before it calls the write function, and as this
concept would use the same general libcurl concepts that same rule would apply
to the zero copy buffers.

> At the other limit, currently if the user want's a smaller buffer, and more
> frequent reference to the write callback, the CURLOPT_BUFFERSIZE option is
> available, but its satisfaction is not guaranteed. What I'm concerned about
> is excessive demands on memory in the zero-copy case where the incoming data
> chunks are small with respect to CURL_MAX_WRITE_SIZE.

That _could_ be done but CURL_MAX_WRITE_SIZE is used in many places in the
code as a build-time limit and it would then have to be converted to a
run-time limit and it will take an effort to make happen (and make sure that
all of those situations still work when you lower the limit)...

-- 
  / daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2010-07-20