cURL / Mailing Lists / curl-users / Single Mail

curl-users

Threading/forking support in curl

From: Seth Mos <seth.mos_at_xs4all.nl>
Date: Wed, 08 Sep 2004 14:45:18 +0200

Hello,

I am in search of threading/forking support for a download manager like
curl or wget. However this seems to be a missing option.

If it does exist in curl, please point it out.

Rambling below.

The idea is to fetch the first page and then fork a seperate process for
  fetching the sub-pages. This is ofcourse a good way to kill off a dsl
line, so it need a config option to limit the amount of forks to say 5
or specified.

The need for this asynchronous fetching arises when the server hosting
the page is on a relatively high latency link (not neccesarily low
speed) and thus serializing the request quee turns into a huge traffic
jam and takes really long.

I did find mget in freebsd ports but the utility and maker seem to have
gone someplace else.

If anyone can point me to such a utility it would be most welcome.

If there is a way in php to fetch a webpages async that would be great
as well. I thought of using a php and then using:
exec("wget -o log/wget_fetch.log -q -nH -P log/ $mp_url")

However this is not quite the solution since this would spawn a huge
amount of processes (over 30 in this case) and would swamp the line and
the server fetching from. Using a locking scheme in php bumps into the
execution_time limit of the already high 30 minutes.

So it's a tricky problem. Short of writing my own script with locking
and fetching scheme (which is a drag, keeping track of double url's
etc.) I could imaging the other people using the spider options would
like it as well.

I did write a shell script utility for something else already which has
a process limiter, I still have the "did we fetch this page yet?"
problem. Which for large sites is best done using something like db I guess.

Cheers
Seth
Received on 2004-09-08