cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: cURL starting questions

From: Ralph Mitchell <ralphmitchell_at_gmail.com>
Date: Sun, 19 Apr 2009 13:01:42 -0500

On Sun, Apr 19, 2009 at 9:41 AM, Jason Todd Slack-Moehrle <
mailinglists_at_mailnewsrss.com> wrote:

> Hi Ralph,
>
> I have some starting cURL questions that I am hoping to gain insight about.
>>
>> I want to start at Dmoz.org and follow links for entertainment (like
>> concerts, art gallery events, etc) and examine the link to see if I should
>> get data back about it and from it.
>
>
> You should probably start here:
>
> http://curl.haxx.se/docs/httpscripting.html
>
> Curl will only grab a web page for you, it won't attempt to interpret the
> page. It won't even download images or script files unless you extract the
> relevant urls from any given page and perform subsequent fetches.
>
>
> So what tool does one use to evaluate the links, etc? How can I make
> decisions and such?
>

I developed my scripts on several Linux platforms, so I used grep, sed, awk
and similar command line tools to extract bits from the saved web pages and
check for specific key words. Some thing like this:

   curl -o home.html http://some.server.com
   X=`grep -ic 'My Stuff' home.html`
   if [ '$X" -ne "1" ]; then
      # didn't find 'my stuff', something went wrong, bail out
      exit
   fi

   # get the 'my stuff' link and hack off all the extra bits
   LINK=`grep 'My Stuff' home.html | sed -e .........`

   lather, rinse, repeat.

At each step you will have to examine the saved page and see how to extract
what you want to get.

Ralph Mitchell

-------------------------------------------------------------------
List admin: http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2009-04-19