curl / Mailing Lists / curl-library / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

libcurl read-like interface

From: XSLT2.0 via curl-library <curl-library_at_cool.haxx.se>
Date: Fri, 25 Dec 2020 18:17:51 +0100

(Continued)

After some homework, I would comment on "read".

First semantics:

fcurl_read implies the caller wants to read the "body", whatever that
means according to the protocol used.

There might still be needs for reading headers (and trailers?), and
could be accommodated either via current callback for headers or a
similar fcurl_read_headers... although mixing callbacks and fcurl_read
should be highly "not recommended" (and could probably fail!).


Now about "read", there are two model: blocking and non blocking.

When writing a fuse driver (my concern), we are in a "blocking" model.
The driver receive requests with this prototype:

int read(const char *path, char *buf, size_t size, off_t offset, struct
fuse_file_info *fi);

-path is quite obviously the path of the file being read

-fi is the "fuse" equivalent of a stream or file pointer

Unless direct_io mount, the buffer must be filled with 'size' bytes from
'offset' of the file, and 'size' is returned.

When the returned 'size' is 0 or smaller than the requested 'size', it
marks the end of file.

Errors are returned as negative numbers.


So basically here, same as for a lot of "simple" programming, we have a
"blocking" read to do... although it does not prevent the kernel from
sending other read requests (that might even overlap!) in parallel.

That is where a simple fcurl_read would be super nice, the
(oversimplified) code of the driver could then be

int myfuse_read_callback(const char *path, char *buf, size_t size, off_t
offset, struct fuse_file_info *fi) {

    CURLcode res;

    /* Assuming we are at the right offset, and curl handle stored in fi
structure! */

    res = fcurl_read( (CURL *)(fi->fh), buf, size);

    /* Add EOF and error checking code */

   return size;

}


Although blocking I/O can be programmed on top of non blocking, and
conversely with principles like EAGAIN, IMHO it is not quite ideal with
the libcurl's current callback "architecture".

Indeed, see fcurl_read's code, if would have at some point to pull data
out of curl's internal buffer into an allocated memory pool and copy it.
There is no theoretical limit to the allocation/multiple copies beside
the file size!

That means in the worst case scenario: copies of the whole file from
socket memory to curl's buffer, from there to memory pool, and from
there to caller's buffer, plus additional copies triggered by realloc
and memory shifts when data is consumed!

Looking at OpenSSL BIO stack/filter architecture, this "read like
potential inefficiency" is because libcurl is already running all the
stack/filter inside its internal buffers prior to callback invocation.


IMHO it would be quite a heavy work to do the fcurl_read efficiently...
but if you can contradict me here, it means I have missed an easy way of
doing things, and I'd be glad to learn!

That would means cleanly separating "filters" so that they can be played
in the right order both for fcurl_read and for the current callback from
internal memory.


Fortunately, for "simple situation" like my own case, libcurl can help
with basic support, using *curl_easy_recv()*

Since my situation is http(s) file download, stored on a http/1.1 server
(for file download, http/2 is anyway counterproductive!), the "filter"
in this case is void, and we can use the stack (direct socket for http
or BIO stack through OpenSSL) that libcurl has setup for us.

Implementing a "read like" function on top of curl_easy_recv() is
trivial and straightforward, even with the "EAGAIN trick"!

This "read like" function itself will directly write data into caller's
buffer: no "multiple intermediary copies", no memory allocation. Excess
data is left in kernel's sockets buffer where it belongs. It is handled
properly there, with kernel tools much more powerful than what we have
in userland!

In the case of http 1.1 (no TLS), it is even a direct copy from the
socket's memory to caller's buffer with no added "layer" (apart from a
few error check code from curl_easy_recv()).

Of course BIO stack/filter (eg TLS) might need some buffering, but that
is always a finite amount of data that would be pulled out of the
socket. This is anyway happening whatever the "architecture"!

For my own case, fuse does not even need the "blocking" version of read,
in fact EAGAIN can give the opportunity to check if we received another
read request in parallel, and if we need to switch thread to better
serve it.

If I had to do gzip or http/2, I could always add a "filter" on top the
the last "read like" layer to do that, and still have a "read like"
feature stack.


Where libcurl could help a bit more here, is going a little further that
CURLOPT_CONNECT_ONLY

As the name implies, it makes libcurl returning after connection
(function multi_runsingle at the start of state "CURLM_STATE_DO")

So for an "http 1.1 download", my program now have to sort of replicate
parts of the code of http.c where the HTTP request is prepared and sent.

Here also, so far as I understood the libcurl's code, I don't see that
the "do" phase is modeled into sending/receiving. Indeed that does not
always make sense, I thing of an FTP upload, you need several
send/receive and another socket!

The ideal would be: connect + do all the talking 'pre-talking', then
return the socket same as CURLOPT_CONNECT_ONLY does.

There might be a need for another "step" here, whose semantics depends
on the situation... if that makes sense!


That is of course not a big deal, in my case I will oversimplify it to
the bare minimum I need for my requests: GET, host, (user-agent), range.

So in fact, I'm glad that my "homework" made me discover the well hidden
*curl_easy_recv()*, I can make do something interesting from that, and
still use "classic callbacks" for WebAPI calls or the like.

That's all I needed, because "read-like" on top of *curl_easy_recv()* is
so trivial, and it is still a pretty good help to hide TLS complexity,
and have the same caller's code for plain http or https.


Cheers, Merry Christmas, and keep up the good job.

Alain

-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.se/mail/etiquette.html
Received on 2020-12-25