curl-library
Re: Info request about the zero copy interface (2)
Date: Mon, 05 Dec 2005 23:32:52 +0100
Daniel Stenberg ha scritto:
> On Mon, 5 Dec 2005, Legolas wrote:
>
>> A great idea would be instead to provide an almost-zero copy
>> interface. I will attach A.S.A.P. a pseudo source snippet, but don't
>> try to take it apart looking for a zero copy interface: for a *real*
>> zero copy interface a major effort is needed.
>
>
> I think we can design an interface now that allows for a pretty good
> zero-copy interface, but it doesn't have to mean that libcurl would
> take full advantage of every aspect of the zero-copy from day 1. I
> agree that we don't have to overdo it: just start with a simple plain
> approach and expand it later if/when we feel the need and have the
> energy for it.
>
> Given the nature of libcurl, as very portable, on top of the transport
> layer and using a whole range of 3rd party libraries, we will of
> course have to live with a number of copies no matter how hard we try.
>
As a very bad example, give a look to the client pseudo code I have
written taking in account what I have read up to now. I have also put in
it my original idea in a soft way. However, that's not my idea, it's
just an idea :)
/*
zcopycli.c - Pseudo code for a theorical client
application able to handle zero-copy
(c) legolas558 _at_ email.it
Read more at:
http://curl.haxx.se/mail/lib-2005-12/0000.html
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
This pseudo code is going to cover two general types of application,
the first with APPLICATION_ONE defined, the second with
APPLICATION_TWO (messy messy messy!)
1. applications that need to just 'look into' a buffer of
downloaded data (file downloading for example)
2. applications that need to stream the entire data into a larger
ordered buffer (various purposes)
A third, important, type of application is not schematized here:
the case of an application able to handle multiple buffers.
Excuse me for general code chaos, I am exploiting the fact this is
just pseudo code...
I was also thinking about a possible usage of specific structs
between the library and the application to allow a better
information exchange about used buffers; this approach is more
likely to be used when a multi-buffers design will take place of this.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Please use the mailing list to give new advices, and also to point out
corrections if needed!
*/
#include <stdio.h>
#include <stdlib.h>
#include <curl/curl.h>
/**/
#define APPLICATION_ONE
/**/
/*
#define APPLICATION_TWO
*/
/* see the main() code for explanation of the followings */
#ifdef APPLICATION_ONE
FILE *output_file;
#else /* APPLICATION_TWO */
#define BUFFER_SIZE (1024*32)
#endif
typedef struct _simple_buf {
char *base;
char *here; /* used by application 2 */
char *top;
struct simple_buf *next; /* that's just a place holder,
and also pads well... */
};
#define simple_buf struct _simple_buf
#define SB_SIZE(sb) ((int)(sb->top-sb->base))
#define SB_AVAIL(sb) ((int)(sb->top-sb->here))
#define SB_DELTA(sb) ((int)(sb->here - sb->base))
#define SB_IN_RANGE(sb, ptr) (((char *)ptr>=sb->base) && ((char *)ptr<=sb->top))
/* this function allows the caller to allocate
buffer with a minimum fixed granularity */
int granularity_fix(int amount, int granularity)
{
div_t dv;
dv = div(amount, granularity);
amount = (dv.quot + (dv.rem>0));
if (!amount) amount++;
return amount*granularity;
}
/* */
int sb_assert_size(simple_buf *sb, int desired_size) {
int delta, new_size;
if (desired_size > SB_SIZE(sb)) {
new_size = granularity_fix(desired_size, 1024);
delta = SB_DELTA(sb);
sb->base = realloc(sb->base, new_size);
if (sb->base == NULL)
return 0;
sb->here = sb->base + delta;
sb->top = sb->base + new_size;
return new_size;
}
return SB_SIZE(sb);
}
typedef void * (awb_prototype(void *custom_data, int *desired_size));
/* from here on, AWB stands for Allocate Write Buffer).
This function is defined by the application and specified to libcurl
through an improbable CURLOPT_AWB parameter forcing the library to
use it instead of the internal one. Follows quote from J.Loker:
"The library will call it when data is not already available in a
fixed location due to algorithms such as chunked decoding, zlib and
SSL decryption."
Basically, this function must return a buffer sized AT LEAST as the
value specified in '*desired_size'. The function would eventually
adjust that value with the allocated size of the buffer (at this
point I can't yet figure out if the library needs this information
however).
An application following behaviour (1) will try to re-use the same
buffer for any single-use operation it needs.
In case (2) application will instead reallocate its buffer and pass
the new pointer to the library.
A similar behaviour is expected from the 'write_callback' function
(see below).
*/
void *allocate_write_buffer(void *custom_data, int *desired_size) {
simple_buf *sb;
sb = (simple_buf *)custom_data;
#ifdef APPLICATION_ONE
*desired_size = sb_assert_size(sb, *desired_size);
#else /* APPLICATION_TWO */
*desired_size = sb_assert_size(sb, SB_DELTA(sb) + *desired_size);
#endif
return sb->here;
}
typedef int (wcb_prototype(void *custom_data, void *real_buffer,
int data_length, int writeable));
/* WCB stands for Write Call Back, this function is called when a discrete
amount of data has been prepared for the client application.
Note: if the library has called 'allocate_write_buffer' and is returning
a buffer got in that way, it is expected 'data_length' being less or
equal to the '*desired_size' value.
The application is aware of 'real_buffer' ownership through the macro
SB_IN_RANGE.
*/
int write_callback(void *custom_data, void *real_buffer,
int data_length, int writeable) {
/* Note: 'writeable' is ignored in this example */
#ifdef APPLICATION_ONE
return fwrite(real_buffer, 1, data_length, output_file);
#else
simple_buf *sb;
sb = (simple_buf *) custom_data;
if (!SB_IN_RANGE(sb, real_buffer)) {
sb_assert_size(sb, SB_DELTA(sb) + data_length);
/* library is providing a private buffer,
all our work here is to copy from that to our big streamed one.
Again, a quote from J.Loker:
"When receiver use its own buffer, and sender already has the
data in its own buffer, then and only then do we have to memcpy()"
*/
memcpy(sb->here, real_buffer, data_length);
} else
/* library has written to our buffer, that's ok */
sb->here += data_length;
return data_length;
#endif
}
int main(void)
{
simple_buf buffer;
CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(!curl) return -1;
memset(&buffer, 0, sizeof(simple_buf));
#ifdef APPLICATION_ONE
output_file = fopen("index.htm", "wb");
/* since output will be flushed to file system,
we do not use any starting memory buffer.
Please note that it if needed (when the library
calls 'allocate_write_buffer') it will be anyway
dynamically allocated. */
#else /* APPLICATION_TWO */
sb_assert_size(&buffer, BUFFER_SIZE);
/* since we need the entire downloaded file into
an ordered memory stream, we allocate the huge
memory block before everything begins */
#endif
curl_easy_setopt(curl, CURLOPT_URL, "curl.haxx.se");
/* set the new 'allocate_write_buffer' handler */
curl_easy_setopt(curl, CURLOPT_AWB, &allocate_write_buffer);
/* Note: CURLOPT_WRITEFUNCTION would have a different meaning */
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &write_callback);
/* CURLOPT_WRITEDATA would be used (as now) to set a custom parameter
('custom_data') for calls to 'allocate_write_buffer' & 'write_callback'
*/
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer);
res = curl_easy_perform(curl);
if (!res) {
fprintf(stderr, "Perform error: %s\n", curl_easy_strerror(res));
curl_easy_cleanup(curl);
#ifdef APPLICATION_ONE
fclose(output_file);
#endif
return -2;
}
/* now, if using application (1), we have a file called 'index.htm'
with the downloaded content. No redundant copies have been made since
the library should have passed its buffers or at worse only a buffer of
the biggest chunk size has been allocated through the awb handler */
/* in the 2nd case a memory stream starting from 'buffer.base' and ending
at 'buffer.here' is available for post-processing.
The usage of CURLOPT_AWB, CURLOPT_WRITEFUNCTION & CURLOPT_WRITEDATA was
necessary only in this case actually.
Implementing a zero copy interface is a very complex problem and this
example is just a draw of 'what should it look like'
*/
free(buffer.base);
curl_easy_cleanup(curl);
#ifdef APPLICATION_ONE
fclose(output_file);
#endif
return 0;
}
Received on 2005-12-05