cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: How to send requests so the server does NOT identify Curl?

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Fri, 15 Jun 2007 11:35:09 +0200 (CEST)

On Fri, 15 Jun 2007, Erick Papadakis wrote:

> After some investigation, I am discovering that many websites can in fact
> tell that their page was requested through a Curl call. Is this something
> that we can do nothing about, or can we request pages in a manner that masks
> the request call as being from a regular browser, and does not sound like a
> Curl request at all?

This is not a small subject.

The short answer being: yes you can most likely make requests that the server
cannot distinguish from a regular browser.

The longer answer: web scripts typically check all sorts of HTTP headers to
detect what browser is running so that they can feed back content that is
browser-dependent. They also typically check referer to prevent "deep linking"
and they verify cookies to check your logged-in state and that your browser
are cookie-enabled etc. They may also check other headers for other
functionalities and to better track what browser you use or simply to detect
automated use.

Then, enter javascript. They can make your browser do all sorts of funny
business and to automate operations (redirecting, setting/removing cookies,
posting forms, ...) etc and since curl cannot run javascript this raises the
bar one notch as you need do exactly what the script does but without running
it.

curl is not capable of making *IDENTICAL* requests as the browsers, but then
there are so many browsers out there that most servers want to support so they
cannot check or require a fixed look either.

(I'm not sure it is a benefit for us if I clearly state those limitations
publicly so I avoid that here, everyone is free to read the sources anyway.)

Mostly you just need to record your browser's activities with LiveHTTPHeaders
or a similar tool, then you make an effort to clone as many of the headers as
possible to make the curl request as similar as possible.

I have yet to find a site that manages to block curl but allows browsers.

-- 
  Commercial curl and libcurl Technical Support: http://haxx.se/curl.html
Received on 2007-06-15