curl-users
Re: The fastest way to download a list of URL
Date: Sun, 12 Oct 2014 10:36:40 +0200
On Sat, Oct 11, 2014 at 05:38:35PM -0300, Rodrigo Zanatta Silva wrote:
> I am writing a console program and need to download 10K URLs. I asked this
> question here in stack overflow.
>
> I just realize I can write files in any way I want. So, I can write a bash
> script with all curl command. And because I want create N threads, I can write
> N bash script files with 10K/N lines and open the N files at same time in
> background.
>
> Is this the easy and best strategy to speed up the downloads?
It's a strategy, but probably not the best. For one thing, at some point your
Internet connection will become saturated, and that's more likely to occur with
10 simultaneous downloads than 10K. The exact figure depends on the relative
differences in speed between your local Internet connection and the remote sites',
and the degree of congestion in the links between them.
Your suggested technique of parallel downloads will work, but you'll want to
batch them instead of performing all 10K at once. Take a look at the xargs
program's -P option, and the program called parallel. Both those will be able
to perform parallel curl downloads without your having to custom craft bash
scripts at all.
> There isn't a program/command or anything that I input with a list of URL with
> the path to save and it will download in N threads as fast as my computer can
> do it?
The curl command-line tool isn't designed for that, but there may be other
programs that will do it. If you're downloading large URLs, you can look at
parallelizing individual downloads, which is something that some programs can
do (e.g. aria2).
>>> Dan
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2014-10-12