cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: CURL and advertising

From: Jean Marie COUPRIE <abies20_at_neuf.fr>
Date: Tue, 13 Nov 2007 15:53:08 +0100

I do not work directly from Curl or C libraries : I use Rexx language
and the port of Curl under Rexx done by Mark Hessling see :
http://www.rexx.org/
and
http://rexxcurl.sourceforge.net/index.html
http://sourceforge.net/project/showfiles.php?group_id=30502
I have never practiced Perl...

> From: Doug McNutt
>> How can I ask Curl to put in variables the cookies to or from the URL that manage advertising so that I can parse them with my program?
>
> Have a look at:
> <http://macnauchtan.com/software/FinpMod/FinpMod.html>
>
> It's what I use for financial reports but it shows a way to get into sites a bit like you seem to be describing. The non-curl parts are in perl which may not be to your liking but you should be able to see what has to be done by reading the comments.
>
> Curl now provides some of the things that I once did by hand. The Cookie jar comes to mind.
Thanks this site seems interesting, I'll try to understand what does the
perl script but this will take time due to my ignorance of Perl...

>From: Alessandro Vesely <vesely_at_tana.it>
>From the man page, http://curl.haxx.se/docs/manpage.html
>--cookie-jar <file name> to store cookies
>--cookie <file name> to pass them along with the request.
I know this but the cookies in the cookie jar do not include the cookies
with the advertising manager site.
>Note that cookies live in the http protocol headers, not in
>the actual page content.
liveHTTPHeaders show this.
>Curl can manage cookies. Not JavaScript. You need to work out what
>JavaScript is required for in order to automate the process. The
>script can change form field values and also make http requests.
>The cookie will be in curl's cookie jar after the first request.
>Probably you (also) have to parse the page's content. Save that to
>a page and use regular expression (e.g. in perl or sed) to extract
>field values from the template that the server used to synthesize the
>content it served to you. That way you prepare the next curl request.
In an other case I have downloaded and read all the *.js (not all the
.jpg or .tiff) listed as "get" in liveHTTPHeaders apparently some or
some other scripts are missing and I cannot discover how some fields are
generated... Do you know a way to UN hide them ?
>If that's overly complicate, it may be more fit to use curl library
>directly from perl or whatever language you are comfortable to use
>for parsing, rather than invoking curl from a shell.
See at the top of this message : with RexxCurl it is easy to load the
content of the page in a set of variable (stem) and parse it. To parse a
file you have first to read (if someone else has not exclusive access)
it to a set of variables. But the interesting cookie is not in the
cookie jar !
Received on 2007-11-13