cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Info request about the zero copy interface (2)

From: Jamie Lokier <jamie_at_shareable.org>
Date: Mon, 5 Dec 2005 17:57:52 +0000

Legolas wrote:
> Jamie Lokier ha scritto:
>
> >Legolas wrote:
> >
> >>>>MainLoop:
> >>>> received_size = recv(yoursocket, internal_buffer,
> >>>> internal_buffer_size, yourflags);
> >>>> buffer_size = forecast_size(received_size);
> >>>> /* forecast someway by libcurl */
> >>>> buffer = write_buffer(custom_data, &buffer_size);
> >>>> /* application may return a bigger buffer */
> >>>> ... (decode SSL, join chunks...)
> >>>> /* work on received data putting final result data into 'buffer'
> >>>> */
> >>>> write_callback(custom_data, final_size);
> >>>> /* previous code must set 'final_size' to the size of data
> >>>> written to 'buffer' */
> >>>>
> >>>>The line followed by the comment "work on received data putting final
> >>>>result data into 'buffer'" copies data into 'buffer'. That copy is
> >>>>not necessary. In what way is this doing "zero-copy"?
> >>>>
> >>>>
> >>Does not copy! I have used the verb _putting_, i.e. each final result
> >>byte or block is written directly to the 'buffer' by the algorithm
> >>intended to be in place of the ellipsis (...). Overall, there is no copy
> >>at the end of the work process (otherwise I would have expressely
> >>pointed out it).
> >>
> >
> >Ah, I think "each final result byte of block is written" is sometimes
> >an unnecessary copy :)
> >
> >Sometimes, it's unavoidable. If we use zlib, or the openssl library,
> >then they will always write their data to a caller-specified buffer,
> >so the above does not cause any extra copying in that case.
> >
> Ok, I was thinking (up to now) that any possible algorithm allowed to
> specify a destination caller-specific buffer (as zlib or openssl).

Generally, those libraries do specify a destination caller-specified
buffer. As long as we assume curl will use those libraries, then it's
fine to assume that.

But, since you're designing an API to avoid copies whatever algorithm
is used, including algorithms in curl's library, it's worth noticing
that the zlib API isn't optimal from the point of view of minimising
copies.

(It's a good, simple and usable API, though). Decompression often
requires a memory of the preceding written bytes (which is why zlib
maintains an output buffer, and then copies from that to the
caller-supplied buffer).

> >However, for chunked decoding, then putting "each final result byte"
> >in 'buffer' means copying the bytes from 'internal_buffer'. For small
> >runs of bytes, that is not significant (because of other overheads).
> >But for large runs, such as large HTTP nchunks, then it is a notable
> >extra copy. The same applies to recv() blocks that contain part HTTP
> >headers and part data.
> >
> libcurl should *smartly* choose to use the 'internal_buffer' (when
> handling overheads for example) and then switch to the direct recv()

Ok, but what should libcurl do when receiving the initial part of a
HTTP response?

The most efficient method to receive the data is to call recv() with a
reasonably large buffer. But that won't do zero copy with your API
for the first chunk of data, because it has to recv() into
'internal_buffer' to parse the headers.

You can, correctly, say that's ok, it's not a problem. I'm pointing
out how the API is non-optimal from the point of view of reducing
unbounded copies; that doesn't mean perfection is really requried,
it's your choice if you don't mind that copy.

> Yeah, I realized it after replying, and I shouldn't say I had understood
> also that point. I mean the first (e): what kind of data processing
> lead to multiple buffers? My idea is to have multiple calls to the
> 'write_buffer' callback, allowing the application to flush data into
> files for example.

For example, decoding chunked data leads to multiple fragments of
contiguous data.

Let's change your example a bit: let's suppose your application is
sending the data to a TCP socket for some reason.

To get reasonable TCP performance, you must not call send() (or
write()) for each buffer received, if those are small. That causes
too many short TCP packets to be sent, on most TCP implementations,
and that can cause large time delays (it's not just overhead: it
interacts with the TCP heuristics to cause extra network delays).

Instead, you must wait until you've got enough data to send a
reasonably large amount in a single send call. You can do this either
by copying the smaller fragments to make them contiguous, and then
using send(). Or you can use sendmsg() (or writev()) to send multiple
fragments with different addresses, without copying them first.

The latter method requires that the library can give your application
each of the fragments, and the application can retain pointers to
those fragments until it is ready to send them.

For files, it is similar (multiple write() vs. writev()), but that
only affects the system call overhead which is not really significant.
TCP write boundaries are more significant because of the side effects.

Now, most applications probably don't care about such details. But if
you're designing an API, which is more complicated than curl's
already, specifically to enable zero copy for all sorts of use, then
you might want to allow the possibility of applications which do care
about such details, and let them still maintain the zero copy
behaviour.

I've found that, in general (perhaps this is too general for what you
have in mind), the rule to minimise bulk copies in a program is this:

    1. Allow the receiver the option of providing a buffer, but allow
       the receiving code to work with other buffers, when that's
       possible.

    2. Allow the sender to use the receiver's provided buffer, when
       that's possible and efficient, but use its own buffer if using
       the receiver's would cause a bulk copy from sender's internal state.

    3. When receiver use its own buffer, and sender already has the
       data in its own buffer, then and only then do we have to memcpy().

    4. Keep track of buffers (or fragments) being passed among
       different parts of the system, particularly if there's a chain
       passing the data along and processing it.

       Sometimes, it's not possible for the receiver to retain a
       pointer to a buffer for long (but that tends more to happen
       when the sender is a hardware device managed by a kernel, with
       a fixed sending buffer, rather than userspace code, so is
       perhaps not relevant).

That attains a good approximation to zero-copy with combinations of
sources and sinks being all sorts of things, including files, sockets,
various protocols, compression & encryption, filters etc. at either end.

-- Jamie
Received on 2005-12-05