cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Info request about the zero copy interface (2)

From: Legolas <legolas558_at_email.it>
Date: Mon, 05 Dec 2005 19:46:59 +0100

Jamie Lokier ha scritto:

>Legolas wrote:
>
>
>>Jamie Lokier ha scritto:
>>
>>
>>
>>>Legolas wrote:
>>>
>>>
>>>
>>>>>>MainLoop:
>>>>>> received_size = recv(yoursocket, internal_buffer,
>>>>>> internal_buffer_size, yourflags);
>>>>>> buffer_size = forecast_size(received_size);
>>>>>> /* forecast someway by libcurl */
>>>>>> buffer = write_buffer(custom_data, &buffer_size);
>>>>>> /* application may return a bigger buffer */
>>>>>> ... (decode SSL, join chunks...)
>>>>>> /* work on received data putting final result data into 'buffer'
>>>>>> */
>>>>>> write_callback(custom_data, final_size);
>>>>>> /* previous code must set 'final_size' to the size of data
>>>>>> written to 'buffer' */
>>>>>>
>>>>>>The line followed by the comment "work on received data putting final
>>>>>>result data into 'buffer'" copies data into 'buffer'. That copy is
>>>>>>not necessary. In what way is this doing "zero-copy"?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>Does not copy! I have used the verb _putting_, i.e. each final result
>>>>byte or block is written directly to the 'buffer' by the algorithm
>>>>intended to be in place of the ellipsis (...). Overall, there is no copy
>>>>at the end of the work process (otherwise I would have expressely
>>>>pointed out it).
>>>>
>>>>
>>>>
>>>Ah, I think "each final result byte of block is written" is sometimes
>>>an unnecessary copy :)
>>>
>>>Sometimes, it's unavoidable. If we use zlib, or the openssl library,
>>>then they will always write their data to a caller-specified buffer,
>>>so the above does not cause any extra copying in that case.
>>>
>>>
>>>
>>Ok, I was thinking (up to now) that any possible algorithm allowed to
>>specify a destination caller-specific buffer (as zlib or openssl).
>>
>>
>
>Generally, those libraries do specify a destination caller-specified
>buffer. As long as we assume curl will use those libraries, then it's
>fine to assume that.
>
>But, since you're designing an API to avoid copies whatever algorithm
>is used, including algorithms in curl's library, it's worth noticing
>that the zlib API isn't optimal from the point of view of minimising
>copies.
>
>(It's a good, simple and usable API, though). Decompression often
>requires a memory of the preceding written bytes (which is why zlib
>maintains an output buffer, and then copies from that to the
>caller-supplied buffer).
>
>
>
>>>However, for chunked decoding, then putting "each final result byte"
>>>in 'buffer' means copying the bytes from 'internal_buffer'. For small
>>>runs of bytes, that is not significant (because of other overheads).
>>>But for large runs, such as large HTTP nchunks, then it is a notable
>>>extra copy. The same applies to recv() blocks that contain part HTTP
>>>headers and part data.
>>>
>>>
>>>
>>libcurl should *smartly* choose to use the 'internal_buffer' (when
>>handling overheads for example) and then switch to the direct recv()
>>
>>
>
>Ok, but what should libcurl do when receiving the initial part of a
>HTTP response?
>
>The most efficient method to receive the data is to call recv() with a
>reasonably large buffer. But that won't do zero copy with your API
>for the first chunk of data, because it has to recv() into
>'internal_buffer' to parse the headers.
>
>You can, correctly, say that's ok, it's not a problem. I'm pointing
>out how the API is non-optimal from the point of view of reducing
>unbounded copies; that doesn't mean perfection is really requried,
>it's your choice if you don't mind that copy.
>
>
>
>>Yeah, I realized it after replying, and I shouldn't say I had understood
>>also that point. I mean the first (e): what kind of data processing
>>lead to multiple buffers? My idea is to have multiple calls to the
>>'write_buffer' callback, allowing the application to flush data into
>>files for example.
>>
>>
>
>For example, decoding chunked data leads to multiple fragments of
>contiguous data.
>
>Let's change your example a bit: let's suppose your application is
>sending the data to a TCP socket for some reason.
>
>To get reasonable TCP performance, you must not call send() (or
>write()) for each buffer received, if those are small. That causes
>too many short TCP packets to be sent, on most TCP implementations,
>and that can cause large time delays (it's not just overhead: it
>interacts with the TCP heuristics to cause extra network delays).
>
>Instead, you must wait until you've got enough data to send a
>reasonably large amount in a single send call. You can do this either
>by copying the smaller fragments to make them contiguous, and then
>using send(). Or you can use sendmsg() (or writev()) to send multiple
>fragments with different addresses, without copying them first.
>
>The latter method requires that the library can give your application
>each of the fragments, and the application can retain pointers to
>those fragments until it is ready to send them.
>
>For files, it is similar (multiple write() vs. writev()), but that
>only affects the system call overhead which is not really significant.
>TCP write boundaries are more significant because of the side effects.
>
>Now, most applications probably don't care about such details. But if
>you're designing an API, which is more complicated than curl's
>already, specifically to enable zero copy for all sorts of use, then
>you might want to allow the possibility of applications which do care
>about such details, and let them still maintain the zero copy
>behaviour.
>
>I've found that, in general (perhaps this is too general for what you
>have in mind), the rule to minimise bulk copies in a program is this:
>
> 1. Allow the receiver the option of providing a buffer, but allow
> the receiving code to work with other buffers, when that's
> possible.
>
> 2. Allow the sender to use the receiver's provided buffer, when
> that's possible and efficient, but use its own buffer if using
> the receiver's would cause a bulk copy from sender's internal state.
>
> 3. When receiver use its own buffer, and sender already has the
> data in its own buffer, then and only then do we have to memcpy().
>
> 4. Keep track of buffers (or fragments) being passed among
> different parts of the system, particularly if there's a chain
> passing the data along and processing it.
>
> Sometimes, it's not possible for the receiver to retain a
> pointer to a buffer for long (but that tends more to happen
> when the sender is a hardware device managed by a kernel, with
> a fixed sending buffer, rather than userspace code, so is
> perhaps not relevant).
>
>That attains a good approximation to zero-copy with combinations of
>sources and sinks being all sorts of things, including files, sockets,
>various protocols, compression & encryption, filters etc. at either end.
>
>-- Jamie
>
>
>
You have talked about many known problems with zero-copy interfaces,
describing almost each possible bottleneck, but my intent was not to
write a zero copy client interface API (I am not able to), and even
not a perfect one in any case :).
I was thinking about a _basic_ zero copy interface, but I agree that
writing a perfect one is better however. It would be also great to
use some TCP heuristics to improve performance, but I was going to
write some lines of pseudo code ignoring it.
About the overhead problem: in that case we must have a compromise
anyway, the least is the best.
In my opinion trying to take in account each of the points you have
up to now mentioned is a bad idea because of the porting purpose.
A great idea would be instead to provide an almost-zero copy interface.
I will attach A.S.A.P. a pseudo source snippet, but don't try to
take it apart looking for a zero copy interface: for a *real* zero copy
interface a major effort is needed.

--
    Giuseppe
Received on 2005-12-05