curl / Mailing Lists / curl-library / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Critique our use of curl; and HTTP/2, /3, multiplexing & pipelining questions

From: Richard W.M. Jones via curl-library <curl-library_at_lists.haxx.se>
Date: Tue, 21 Feb 2023 11:12:57 +0000

Hello curl developers,

We use libcurl quite extensively in an open source program called
nbdkit. https://gitlab.com/nbdkit/nbdkit/-/tree/master/plugins/curl

nbdkit is a server which serves Network Block Device (NBD) on one side
and (in the case where we use curl) forwards to a web server. An
example usage might be:

  $ nbdkit -r curl https://example.com/disk.img

Then we'd access the server using the NBD protocol on port 10809 in
order to random-access 'disk.img'. eg. to boot the disk in a VM you
might do something like:

  $ qemu-system-x86_64 -M q35,accel=kvm -cpu host -m 2048 -drive file.driver=nbd,file.host=localhost,if=virtio,snapshot=on

Or download the image to local using:

  $ nbdcopy nbd://localhost disk.img

(nbdcopy uses multiple threads and is not really like a curl download).

Modern NBD is a fast, efficient protocol that supports pipelining
requests and spreading requests across multiple connections to the
server (nbdkit).

Recently we've been having debates in the upstream community about
performance of our curl-based plugin. Performance has become a
paramount concern for some of our users, even if the implementation
becomes complex.

As currently implemented, nbdkit will start many threads (16+) and
dispatch NBD requests to the curl plugin from those threads in
parallel[1]. There is a pool of (by default 4) libcurl easy handles,
all configured in the same way. When a request comes in, it picks a
free handle (or waits for one to become free) and then synchronously
makes the HTTP/HTTPS request using WRITEFUNCTION/WRITEDATA +
curl_easy_perform.

  nbdkit nbdkit-curl-plugin

  ---------------------> \ +--------+
  ---------------------> \ | CURL* --------> web server
  ---------------------> | CURL* --------> web server
  ---------------------> / | CURL* --------> web server
  ---------------------> / | CURL* --------> web server
  ---------------------> get/put +--------+
  ---------------------> handles pool
    threads, each doing
    a single NBD request + reply

Observing the current plugin shows that curl is opening up to 4 TCP
connections to the web server, as expected.

Firstly, I don't understand if the multi interface would actually help
us here. Because nbdkit gives us lots of threads and expects an NBD
request to be processed synchronously on that thread, using the easy
interface is a natural .. easy(!) .. fit.

We could create a separate, new pool of threads, eg. one per CURL*
handle, but that seems like it would add more overhead as we pass
requests from the nbdkit threads to the new thread pool using some
kind of queue structure.

A second thing I'm unclear about with multi is whether the individual
easy handles which are added are related in any way -- eg. if they all
share the same TCP connection to the web server? Reading the page
makes me think this is not the case, the multi interface is just a way
to group easy handles for the purposes of using a select/poll or
event-driven API, and apart from that there is no relationship.

The third and main concern is whether we are using curl most
efficiently. In particular, whether we are using HTTP/2 (and in
future HTTP/3) as efficiently as we could be (eg. exploiting multiplexing).

I notice that HTTP/1.1-style pipelining was removed from curl, and I
suppose HTTP/2 multiplexing is meant to replace this. However since
we are using the easy interface and doing everything synchronously,
it's my understanding that we are not exploiting multiplexing, unless
curl itself does something clever internally.

Any comments on this design and thoughts on ways we could improve
things are most welcome.

TIA,

Rich.

----------------------------------------------------------------------

Notes

[1] Actually not in parallel right now because we found that fully
parallel requests caused the plugin to slow down. However if you
patch nbdkit-curl-plugin like this then it works the way I describe
above:

diff --git a/plugins/curl/curl.c b/plugins/curl/curl.c
index 70c0a9ec9..47c9d7d41 100644
--- a/plugins/curl/curl.c
+++ b/plugins/curl/curl.c
_at__at_ -474,7 +474,7 _at__at_ curl_close (void *handle)
  * of pessimising common workloads. See:
  * https://listman.redhat.com/archives/libguestfs/2023-February/030618.html
  */
-#define THREAD_MODEL NBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS
+#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL
 
 /* Calls get_handle() ... put_handle() to get a handle for the length
  * of the current scope.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
-- 
Unsubscribe: https://lists.haxx.se/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html
Received on 2023-02-21