cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Using curl for spidering

From: Ralph Mitchell <ralphmitchell_at_gmail.com>
Date: Mon, 29 Nov 2004 08:45:49 -0600

I haven't dug into Expedia much, but just looking at their home page I
can see this javascript block at the top of the BODY section:

        var d, u, s;
        d = new Date();
        s="";
        u = "/pub/agent.dll?qscr=home&&BCheck=1"+s+"&zz="+d.getTime();

        document.cookie = "jscript=1; path=/;";

        window.location.replace(u);

If that's the kind of thing you're looking at in the flight selection
pages, it shouldn't be too hard to fake. All the above code is doing
is making up a new URL in the "u" variable, then setting the page
location to that URL - it's a redirect, that's all. The
"/pub/agent.dll" is at the server end, and executing it is the same as
executing any other cgi or server-based script.

If you need to verify what to send to the server, try LiveHTTPheaders
with Firefox. It's a tool that shows all the headers that pass
between Firefox and a remote web server. You can get it here:

     http://livehttpheaders.mozdev.org/

It even shows headers from secure connections, which is difficult to
manage with any other kind of network monitor...

If Expedia really is executing some local dll on your PC, you should
still be able to see the headers that fly back and forth (which will
be *after* the dll executes), and then you can work out how to
generate them in your script without using the dll. I think it's more
likely that the dll is a server-side cgi kind of thing, though.

Ralph Mitchell

On Sun, 28 Nov 2004 15:06:51 -0800, ravinder dharmapuram
<dharmapu_at_usc.edu> wrote:
> hi,
>
> We have used Curl as an effective tool to spider web sites.
> we can handle cookies and also output header formats as required.
> However sites like "expedia" present a problem.
> When the user selects a choice for the flight then i think a dll file gets executed.
> Also user gets redirected to another page .
>
> Can curl be used to handle such sites ?
> Page redirection can be handled using the -L option in other sites.
>
> Any help would be appreciated
>
> Regards
> Ravinder
>
>
>
Received on 2004-11-29