curl-users
Re: GET request
Date: Tue, 27 Nov 2001 19:19:03 +0400
Hi Nick,
At 14:19 26-11-2001 +0200, Nick Chirca wrote:
>The bandwith from my end is a medium one. I mean, it's not a very good
>one, but not a bad one either. I know there is a limit for 1GB per month,
>but the traffic in a month, is at least a few times more than 1 GB. I
>don't know the exact bandwith right now (before I ask our sysadmin), but
>the real bandwith is around 64K/sec I think.
If there is a limit of 1 GB per month, the site owner will have to pay for
the additional traffic. It is understandable that sysadmin would disable
your account to lessen the traffic. A 64Kb/sec bandwidth is not much and I
would not let users run web robots on such a link.
>Well, my script extracts information from every page it gets from that
>server (www.blackvoices.com). And the time between get requests is at
>least 1 sec. I will add "sleep(60)" between HTTP requests though and see
>if this works. But I had the same problem (downloading a page, in a
>loop) with the other server I wanted to crawl (www.findmymate.com). I will
>try your sugestion and see what happens.
One second per web request can be considered abusive, especially if you are
trying to pull every page from a website. Given that your apps is working
as a web robot, you should verify the robots policy of the site before
crawling through it.
Webmasters make it difficult for people to access their websites without a
browser partly because of such behaviour. Using cURL is more that running
it at the command line. You should also follow Netiquette.
Regards,
-sm
Received on 2001-11-27