curl-users
Threading/forking support in curl
Date: Wed, 08 Sep 2004 14:45:18 +0200
Hello,
I am in search of threading/forking support for a download manager like
curl or wget. However this seems to be a missing option.
If it does exist in curl, please point it out.
Rambling below.
The idea is to fetch the first page and then fork a seperate process for
fetching the sub-pages. This is ofcourse a good way to kill off a dsl
line, so it need a config option to limit the amount of forks to say 5
or specified.
The need for this asynchronous fetching arises when the server hosting
the page is on a relatively high latency link (not neccesarily low
speed) and thus serializing the request quee turns into a huge traffic
jam and takes really long.
I did find mget in freebsd ports but the utility and maker seem to have
gone someplace else.
If anyone can point me to such a utility it would be most welcome.
If there is a way in php to fetch a webpages async that would be great
as well. I thought of using a php and then using:
exec("wget -o log/wget_fetch.log -q -nH -P log/ $mp_url")
However this is not quite the solution since this would spawn a huge
amount of processes (over 30 in this case) and would swamp the line and
the server fetching from. Using a locking scheme in php bumps into the
execution_time limit of the already high 30 minutes.
So it's a tricky problem. Short of writing my own script with locking
and fetching scheme (which is a drag, keeping track of double url's
etc.) I could imaging the other people using the spider options would
like it as well.
I did write a shell script utility for something else already which has
a process limiter, I still have the "did we fetch this page yet?"
problem. Which for large sites is best done using something like db I guess.
Cheers
Seth
Received on 2004-09-08