cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: GET request

From: SM <nntp_at_iname.com>
Date: Tue, 27 Nov 2001 19:19:03 +0400

Hi Nick,
At 14:19 26-11-2001 +0200, Nick Chirca wrote:
>The bandwith from my end is a medium one. I mean, it's not a very good
>one, but not a bad one either. I know there is a limit for 1GB per month,
>but the traffic in a month, is at least a few times more than 1 GB. I
>don't know the exact bandwith right now (before I ask our sysadmin), but
>the real bandwith is around 64K/sec I think.

If there is a limit of 1 GB per month, the site owner will have to pay for
the additional traffic. It is understandable that sysadmin would disable
your account to lessen the traffic. A 64Kb/sec bandwidth is not much and I
would not let users run web robots on such a link.

>Well, my script extracts information from every page it gets from that
>server (www.blackvoices.com). And the time between get requests is at
>least 1 sec. I will add "sleep(60)" between HTTP requests though and see
>if this works. But I had the same problem (downloading a page, in a
>loop) with the other server I wanted to crawl (www.findmymate.com). I will
>try your sugestion and see what happens.

One second per web request can be considered abusive, especially if you are
trying to pull every page from a website. Given that your apps is working
as a web robot, you should verify the robots policy of the site before
crawling through it.

Webmasters make it difficult for people to access their websites without a
browser partly because of such behaviour. Using cURL is more that running
it at the command line. You should also follow Netiquette.

Regards,
-sm
Received on 2001-11-27