curl / Mailing Lists / curl-users / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Re: How to use curl -Z (--parallel) effectively?

From: rs _ via curl-users <curl-users_at_lists.haxx.se>
Date: Tue, 8 Mar 2022 16:05:03 -0600

> Where are these solutions documented? Not in
https://everything.curl.dev/cmdline/urls/parallel (very useful anyway thx
to the author). Not found in man...

The second solution can be found in the section for `-K/--config <config
file>` in https://curl.se/docs/manpage.html

The first solution could be considered standard when passing arguments to
other programs, particularly since it is simply relying on a feature of
your shell. Therefore, it doesn't need to be part of the documentation in
curl.

> Does --parallel|-Z replace completely xargs -P x or Gnu Parallel in term
of performance?

curl should be a lot faster than GNU parallel, but parallel provides much
more flexibility. A simple example fetching 500 domains, shows the
following results:

time curl --parallel --parallel-immediate --parallel-max 5 --config
config.txt

real 0m51.699s
user 0m6.311s
sys 0m0.977s

time parallel -j5 'curl {} > {#}' :::: domains.txt

real 2m29.150s
user 0m22.677s
sys 0m11.311s

However, comparing both directories, there are many more empty file with
curl --parallel:

8.4M curl
44M parallel

Most of the empty files can be found at the of the list, so maybe curl
retrieves the domains so fast that it triggers firewall protections easily.
I wouldn't be so concerned about performance with this sort of thing.
Usually, network latency will be your bottleneck, not so much the tool used
to download files.

By the way, I'm running curl with the following .curlrc file:

User-Agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/88.0.4324.182 Safari/537.36"
insecure
silent
connect-timeout = 5

On Tue, Mar 8, 2022 at 8:18 AM jacques granduel via curl-users <
curl-users_at_lists.haxx.se> wrote:

> Thanks for your answers.
> I tried the array way on PowerShell and it worked out indeed
>
> $ar = _at_("-o", "path/to/file1", "http://site/path/to/file1", "-o",
> "path/to/file2", "http://site/path/to/file2")
> curl -Z $ar # or curl --create-dirs --parallel --parallel-immediate
> --parallel-max 10 $ar
>
> I tried alwo with the config file having this pattern
>
> url=http://site/path/to/file <http://site/path/to/file1>1
> output=path/to/file
>
> Where are these solutions documented? Not in
> https://everything.curl.dev/cmdline/urls/parallel (very useful anyway thx
> to the author). Not found in man...
>
> Does --parallel|-Z replace completely xargs -P x or Gnu Parallel in term
> of performance?
>
> Thanks again to all.
>
>
>
> Le dim. 6 mars 2022 à 13:24, jacques granduel <jgrnduel_at_gmail.com> a
> écrit :
>
>> Hi Curl Community,
>>
>> I have posted 2 questions on StackOverFlow
>> <https://stackoverflow.com/questions/71244217/how-to-use-curl-z-parallel-effectively>
>> as I thought I would get a very quick answer but didn't get any! I would
>> like to get an answer anyway, for me and SOF users, if somehome come over
>> it. Sorry in advance for this double posting.
>> Here's my question:
>>
>> I need to download thousands of files with *curl*. I know how to
>> parallelize with xargs -Pn (or gnu parallel) but I've just discovered
>> curl itself can parallelize downloads with the argument -Z|--parallel
>> introduced in *curl-7.66* (see curl-goez-parallel
>> <https://daniel.haxx.se/blog/2019/07/22/curl-goez-parallel/>) which
>> might be cleaner or easier to share. I need to use -o|--output option
>> and --create-dirs. URLs need to be *percent-encoded*, the URL path
>> becoming the folder path which also need to be escaped as path can contain
>> single quotes, spaces, and usual suspects (hence -O option is not safe
>> and -OJ option doesn't help). If I understand well, curl command should
>> be build like so:
>>
>> curl -Z -o path/to/file1 http://site/path/to/file1 -o path/to/file2 http://site/path/to/file2 [-o path/to/file3 http://site/path/to/file3, etc.]
>>
>> This works indeed, but what's the best way to deal with thousands URLS.
>> Can a config file used with -K config be useful? what if the -o
>> path/to/file_x http://site/path/to/file_x is the output of another
>> program? I haven't found any way to record commands in a file, one command
>> per line, say.
>>
>> Thanks in advance if you can give me any tip!
>>
>> Best regards
>>
> --
> Unsubscribe: https://lists.haxx.se/listinfo/curl-users
> Etiquette: https://curl.haxx.se/mail/etiquette.html
>


-- 
Unsubscribe: https://lists.haxx.se/listinfo/curl-users
Etiquette:   https://curl.haxx.se/mail/etiquette.html
Received on 2022-03-09