curl / Mailing Lists / curl-library / Single Mail
Buy commercial curl support. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder Daniel himself.

How can libcurl/curl_multi perform like curl --parallel ?

From: Oliver Schonrock via curl-library <curl-library_at_lists.haxx.se>
Date: Fri, 25 Oct 2024 08:58:33 +0100

Basically I want to do something very similar to below, but
programmatically as part of a c++ application using libcurl:

curl --retry 10 --retry-all-errors --remote-name-all --parallel
--parallel-max 150
"https://api.pwnedpasswords.com/range/000{0,1,2,3}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}"
> curl.log 2>&1

The above retrieves 64 text files each about 32kB. On a cheap VM with
a Gbit internet connection this takes only about 0.2seconds. Awesome.

I started with this example

https://curl.se/libcurl/c/multi-event.html

from the official site. Used Verbatim.

Compiled like this:
gcc -O3 -Wall -Wextra -Wno-unused-parameter -std=c11 -o multi multi.c
-lcurl -levent

I am on ubuntu 24.04:
$ uname -a
Linux oliver 6.8.0-47-generic #47-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27
21:40:26 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

$ dpkg -l | egrep 'libcurl|libevent|openssl' | awk '{print $2,$3}' |
column -t
libcurl3t64-gnutls:amd64 8.5.0-2ubuntu10.4
libcurl4t64:amd64 8.5.0-2ubuntu10.4
libevent-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
libevent-core-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
libevent-dev 2.1.12-stable-9ubuntu2
libevent-extra-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
libevent-openssl-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
libevent-pthreads-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
openssl 3.0.13-0ubuntu3.4

If I pass the same 64 urls to the resulting program as `argv` like this:
./multi \
      "https://api.pwnedpasswords.com/range/00000" \
      "https://api.pwnedpasswords.com/range/00001" \
      "https://api.pwnedpasswords.com/range/00002" \
      "https://api.pwnedpasswords.com/range/00003" \
      "https://api.pwnedpasswords.com/range/00004" \
      "https://api.pwnedpasswords.com/range/00005" \
      "https://api.pwnedpasswords.com/range/00006" \
      "https://api.pwnedpasswords.com/range/00007" \
      "https://api.pwnedpasswords.com/range/00008" \
      "https://api.pwnedpasswords.com/range/00009" \
      "https://api.pwnedpasswords.com/range/0000A" \
      "https://api.pwnedpasswords.com/range/0000B" \
      "https://api.pwnedpasswords.com/range/0000C" \
      "https://api.pwnedpasswords.com/range/0000D" \
      "https://api.pwnedpasswords.com/range/0000E" \
      "https://api.pwnedpasswords.com/range/0000F" \
      "https://api.pwnedpasswords.com/range/00010" \
      "https://api.pwnedpasswords.com/range/00011" \
      "https://api.pwnedpasswords.com/range/00012" \
      "https://api.pwnedpasswords.com/range/00013" \
      "https://api.pwnedpasswords.com/range/00014" \
      "https://api.pwnedpasswords.com/range/00015" \
      "https://api.pwnedpasswords.com/range/00016" \
      "https://api.pwnedpasswords.com/range/00017" \
      "https://api.pwnedpasswords.com/range/00018" \
      "https://api.pwnedpasswords.com/range/00019" \
      "https://api.pwnedpasswords.com/range/0001A" \
      "https://api.pwnedpasswords.com/range/0001B" \
      "https://api.pwnedpasswords.com/range/0001C" \
      "https://api.pwnedpasswords.com/range/0001D" \
      "https://api.pwnedpasswords.com/range/0001E" \
      "https://api.pwnedpasswords.com/range/0001F" \
      "https://api.pwnedpasswords.com/range/00020" \
      "https://api.pwnedpasswords.com/range/00021" \
      "https://api.pwnedpasswords.com/range/00022" \
      "https://api.pwnedpasswords.com/range/00023" \
      "https://api.pwnedpasswords.com/range/00024" \
      "https://api.pwnedpasswords.com/range/00025" \
      "https://api.pwnedpasswords.com/range/00026" \
      "https://api.pwnedpasswords.com/range/00027" \
      "https://api.pwnedpasswords.com/range/00028" \
      "https://api.pwnedpasswords.com/range/00029" \
      "https://api.pwnedpasswords.com/range/0002A" \
      "https://api.pwnedpasswords.com/range/0002B" \
      "https://api.pwnedpasswords.com/range/0002C" \
      "https://api.pwnedpasswords.com/range/0002D" \
      "https://api.pwnedpasswords.com/range/0002E" \
      "https://api.pwnedpasswords.com/range/0002F" \
      "https://api.pwnedpasswords.com/range/00030" \
      "https://api.pwnedpasswords.com/range/00031" \
      "https://api.pwnedpasswords.com/range/00032" \
      "https://api.pwnedpasswords.com/range/00033" \
      "https://api.pwnedpasswords.com/range/00034" \
      "https://api.pwnedpasswords.com/range/00035" \
      "https://api.pwnedpasswords.com/range/00036" \
      "https://api.pwnedpasswords.com/range/00037" \
      "https://api.pwnedpasswords.com/range/00038" \
      "https://api.pwnedpasswords.com/range/00039" \
      "https://api.pwnedpasswords.com/range/0003A" \
      "https://api.pwnedpasswords.com/range/0003B" \
      "https://api.pwnedpasswords.com/range/0003C" \
      "https://api.pwnedpasswords.com/range/0003D" \
      "https://api.pwnedpasswords.com/range/0003E" \
      "https://api.pwnedpasswords.com/range/0003F" \
      ;


It gets the files OK, but take 3seconds. `top` shows 100% CPU. ie CPU
bound.

15x slower.

That makes using this unfeasible as I need to retrieve 1 million such files.

This article

https://daniel.haxx.se/docs/poll-vs-select.html

suggests that event based curl_multi is the fastest. That's why I
chose that example using libevent.

I checked the curl_multi options:
https://curl.se/libcurl/c/multi_setopt_options.html

to ensure I was getting connection pooling (all on the same domain
etc), and I didn't find anything to suggest I was not. The server for
the urls above offers HTTP2 with TLS3. I havent't checked but
potentially `curl ---parallel` is using a single connection with HTTP2
streams?

What is `curl --parallel --parallel-max 150` doing internally and how
can I reproduce this performance with libcurl?

Many thanks for any help/input.

Oliver
-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html
Received on 2024-10-25