Buy commercial curl support. We
help you work out your issues, debug your libcurl applications, use the API,
port to new platforms, add new features and more. With a team lead by the
curl founder Daniel himself.
How can libcurl/curl_multi perform like curl --parallel ?
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: Oliver Schonrock via curl-library <curl-library_at_lists.haxx.se>
Date: Fri, 25 Oct 2024 08:58:33 +0100
Basically I want to do something very similar to below, but
programmatically as part of a c++ application using libcurl:
curl --retry 10 --retry-all-errors --remote-name-all --parallel
--parallel-max 150
"https://api.pwnedpasswords.com/range/000{0,1,2,3}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}"
> curl.log 2>&1
The above retrieves 64 text files each about 32kB. On a cheap VM with
a Gbit internet connection this takes only about 0.2seconds. Awesome.
I started with this example
https://curl.se/libcurl/c/multi-event.html
from the official site. Used Verbatim.
Compiled like this:
gcc -O3 -Wall -Wextra -Wno-unused-parameter -std=c11 -o multi multi.c
-lcurl -levent
I am on ubuntu 24.04:
$ uname -a
Linux oliver 6.8.0-47-generic #47-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27
21:40:26 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ dpkg -l | egrep 'libcurl|libevent|openssl' | awk '{print $2,$3}' |
column -t
libcurl3t64-gnutls:amd64 8.5.0-2ubuntu10.4
libcurl4t64:amd64 8.5.0-2ubuntu10.4
libevent-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
libevent-core-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
libevent-dev 2.1.12-stable-9ubuntu2
libevent-extra-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
libevent-openssl-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
libevent-pthreads-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
openssl 3.0.13-0ubuntu3.4
If I pass the same 64 urls to the resulting program as `argv` like this:
./multi \
"https://api.pwnedpasswords.com/range/00000" \
"https://api.pwnedpasswords.com/range/00001" \
"https://api.pwnedpasswords.com/range/00002" \
"https://api.pwnedpasswords.com/range/00003" \
"https://api.pwnedpasswords.com/range/00004" \
"https://api.pwnedpasswords.com/range/00005" \
"https://api.pwnedpasswords.com/range/00006" \
"https://api.pwnedpasswords.com/range/00007" \
"https://api.pwnedpasswords.com/range/00008" \
"https://api.pwnedpasswords.com/range/00009" \
"https://api.pwnedpasswords.com/range/0000A" \
"https://api.pwnedpasswords.com/range/0000B" \
"https://api.pwnedpasswords.com/range/0000C" \
"https://api.pwnedpasswords.com/range/0000D" \
"https://api.pwnedpasswords.com/range/0000E" \
"https://api.pwnedpasswords.com/range/0000F" \
"https://api.pwnedpasswords.com/range/00010" \
"https://api.pwnedpasswords.com/range/00011" \
"https://api.pwnedpasswords.com/range/00012" \
"https://api.pwnedpasswords.com/range/00013" \
"https://api.pwnedpasswords.com/range/00014" \
"https://api.pwnedpasswords.com/range/00015" \
"https://api.pwnedpasswords.com/range/00016" \
"https://api.pwnedpasswords.com/range/00017" \
"https://api.pwnedpasswords.com/range/00018" \
"https://api.pwnedpasswords.com/range/00019" \
"https://api.pwnedpasswords.com/range/0001A" \
"https://api.pwnedpasswords.com/range/0001B" \
"https://api.pwnedpasswords.com/range/0001C" \
"https://api.pwnedpasswords.com/range/0001D" \
"https://api.pwnedpasswords.com/range/0001E" \
"https://api.pwnedpasswords.com/range/0001F" \
"https://api.pwnedpasswords.com/range/00020" \
"https://api.pwnedpasswords.com/range/00021" \
"https://api.pwnedpasswords.com/range/00022" \
"https://api.pwnedpasswords.com/range/00023" \
"https://api.pwnedpasswords.com/range/00024" \
"https://api.pwnedpasswords.com/range/00025" \
"https://api.pwnedpasswords.com/range/00026" \
"https://api.pwnedpasswords.com/range/00027" \
"https://api.pwnedpasswords.com/range/00028" \
"https://api.pwnedpasswords.com/range/00029" \
"https://api.pwnedpasswords.com/range/0002A" \
"https://api.pwnedpasswords.com/range/0002B" \
"https://api.pwnedpasswords.com/range/0002C" \
"https://api.pwnedpasswords.com/range/0002D" \
"https://api.pwnedpasswords.com/range/0002E" \
"https://api.pwnedpasswords.com/range/0002F" \
"https://api.pwnedpasswords.com/range/00030" \
"https://api.pwnedpasswords.com/range/00031" \
"https://api.pwnedpasswords.com/range/00032" \
"https://api.pwnedpasswords.com/range/00033" \
"https://api.pwnedpasswords.com/range/00034" \
"https://api.pwnedpasswords.com/range/00035" \
"https://api.pwnedpasswords.com/range/00036" \
"https://api.pwnedpasswords.com/range/00037" \
"https://api.pwnedpasswords.com/range/00038" \
"https://api.pwnedpasswords.com/range/00039" \
"https://api.pwnedpasswords.com/range/0003A" \
"https://api.pwnedpasswords.com/range/0003B" \
"https://api.pwnedpasswords.com/range/0003C" \
"https://api.pwnedpasswords.com/range/0003D" \
"https://api.pwnedpasswords.com/range/0003E" \
"https://api.pwnedpasswords.com/range/0003F" \
;
It gets the files OK, but take 3seconds. `top` shows 100% CPU. ie CPU
bound.
15x slower.
That makes using this unfeasible as I need to retrieve 1 million such files.
This article
https://daniel.haxx.se/docs/poll-vs-select.html
suggests that event based curl_multi is the fastest. That's why I
chose that example using libevent.
I checked the curl_multi options:
https://curl.se/libcurl/c/multi_setopt_options.html
to ensure I was getting connection pooling (all on the same domain
etc), and I didn't find anything to suggest I was not. The server for
the urls above offers HTTP2 with TLS3. I havent't checked but
potentially `curl ---parallel` is using a single connection with HTTP2
streams?
What is `curl --parallel --parallel-max 150` doing internally and how
can I reproduce this performance with libcurl?
Many thanks for any help/input.
Oliver
Date: Fri, 25 Oct 2024 08:58:33 +0100
Basically I want to do something very similar to below, but
programmatically as part of a c++ application using libcurl:
curl --retry 10 --retry-all-errors --remote-name-all --parallel
--parallel-max 150
"https://api.pwnedpasswords.com/range/000{0,1,2,3}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}"
> curl.log 2>&1
The above retrieves 64 text files each about 32kB. On a cheap VM with
a Gbit internet connection this takes only about 0.2seconds. Awesome.
I started with this example
https://curl.se/libcurl/c/multi-event.html
from the official site. Used Verbatim.
Compiled like this:
gcc -O3 -Wall -Wextra -Wno-unused-parameter -std=c11 -o multi multi.c
-lcurl -levent
I am on ubuntu 24.04:
$ uname -a
Linux oliver 6.8.0-47-generic #47-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27
21:40:26 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ dpkg -l | egrep 'libcurl|libevent|openssl' | awk '{print $2,$3}' |
column -t
libcurl3t64-gnutls:amd64 8.5.0-2ubuntu10.4
libcurl4t64:amd64 8.5.0-2ubuntu10.4
libevent-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
libevent-core-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
libevent-dev 2.1.12-stable-9ubuntu2
libevent-extra-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
libevent-openssl-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
libevent-pthreads-2.1-7t64:amd64 2.1.12-stable-9ubuntu2
openssl 3.0.13-0ubuntu3.4
If I pass the same 64 urls to the resulting program as `argv` like this:
./multi \
"https://api.pwnedpasswords.com/range/00000" \
"https://api.pwnedpasswords.com/range/00001" \
"https://api.pwnedpasswords.com/range/00002" \
"https://api.pwnedpasswords.com/range/00003" \
"https://api.pwnedpasswords.com/range/00004" \
"https://api.pwnedpasswords.com/range/00005" \
"https://api.pwnedpasswords.com/range/00006" \
"https://api.pwnedpasswords.com/range/00007" \
"https://api.pwnedpasswords.com/range/00008" \
"https://api.pwnedpasswords.com/range/00009" \
"https://api.pwnedpasswords.com/range/0000A" \
"https://api.pwnedpasswords.com/range/0000B" \
"https://api.pwnedpasswords.com/range/0000C" \
"https://api.pwnedpasswords.com/range/0000D" \
"https://api.pwnedpasswords.com/range/0000E" \
"https://api.pwnedpasswords.com/range/0000F" \
"https://api.pwnedpasswords.com/range/00010" \
"https://api.pwnedpasswords.com/range/00011" \
"https://api.pwnedpasswords.com/range/00012" \
"https://api.pwnedpasswords.com/range/00013" \
"https://api.pwnedpasswords.com/range/00014" \
"https://api.pwnedpasswords.com/range/00015" \
"https://api.pwnedpasswords.com/range/00016" \
"https://api.pwnedpasswords.com/range/00017" \
"https://api.pwnedpasswords.com/range/00018" \
"https://api.pwnedpasswords.com/range/00019" \
"https://api.pwnedpasswords.com/range/0001A" \
"https://api.pwnedpasswords.com/range/0001B" \
"https://api.pwnedpasswords.com/range/0001C" \
"https://api.pwnedpasswords.com/range/0001D" \
"https://api.pwnedpasswords.com/range/0001E" \
"https://api.pwnedpasswords.com/range/0001F" \
"https://api.pwnedpasswords.com/range/00020" \
"https://api.pwnedpasswords.com/range/00021" \
"https://api.pwnedpasswords.com/range/00022" \
"https://api.pwnedpasswords.com/range/00023" \
"https://api.pwnedpasswords.com/range/00024" \
"https://api.pwnedpasswords.com/range/00025" \
"https://api.pwnedpasswords.com/range/00026" \
"https://api.pwnedpasswords.com/range/00027" \
"https://api.pwnedpasswords.com/range/00028" \
"https://api.pwnedpasswords.com/range/00029" \
"https://api.pwnedpasswords.com/range/0002A" \
"https://api.pwnedpasswords.com/range/0002B" \
"https://api.pwnedpasswords.com/range/0002C" \
"https://api.pwnedpasswords.com/range/0002D" \
"https://api.pwnedpasswords.com/range/0002E" \
"https://api.pwnedpasswords.com/range/0002F" \
"https://api.pwnedpasswords.com/range/00030" \
"https://api.pwnedpasswords.com/range/00031" \
"https://api.pwnedpasswords.com/range/00032" \
"https://api.pwnedpasswords.com/range/00033" \
"https://api.pwnedpasswords.com/range/00034" \
"https://api.pwnedpasswords.com/range/00035" \
"https://api.pwnedpasswords.com/range/00036" \
"https://api.pwnedpasswords.com/range/00037" \
"https://api.pwnedpasswords.com/range/00038" \
"https://api.pwnedpasswords.com/range/00039" \
"https://api.pwnedpasswords.com/range/0003A" \
"https://api.pwnedpasswords.com/range/0003B" \
"https://api.pwnedpasswords.com/range/0003C" \
"https://api.pwnedpasswords.com/range/0003D" \
"https://api.pwnedpasswords.com/range/0003E" \
"https://api.pwnedpasswords.com/range/0003F" \
;
It gets the files OK, but take 3seconds. `top` shows 100% CPU. ie CPU
bound.
15x slower.
That makes using this unfeasible as I need to retrieve 1 million such files.
This article
https://daniel.haxx.se/docs/poll-vs-select.html
suggests that event based curl_multi is the fastest. That's why I
chose that example using libevent.
I checked the curl_multi options:
https://curl.se/libcurl/c/multi_setopt_options.html
to ensure I was getting connection pooling (all on the same domain
etc), and I didn't find anything to suggest I was not. The server for
the urls above offers HTTP2 with TLS3. I havent't checked but
potentially `curl ---parallel` is using a single connection with HTTP2
streams?
What is `curl --parallel --parallel-max 150` doing internally and how
can I reproduce this performance with libcurl?
Many thanks for any help/input.
Oliver
-- Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library Etiquette: https://curl.se/mail/etiquette.htmlReceived on 2024-10-25