curl-users
Re: cURL starting questions
Date: Sun, 19 Apr 2009 13:01:42 -0500
On Sun, Apr 19, 2009 at 9:41 AM, Jason Todd Slack-Moehrle <
mailinglists_at_mailnewsrss.com> wrote:
> Hi Ralph,
>
> I have some starting cURL questions that I am hoping to gain insight about.
>>
>> I want to start at Dmoz.org and follow links for entertainment (like
>> concerts, art gallery events, etc) and examine the link to see if I should
>> get data back about it and from it.
>
>
> You should probably start here:
>
> http://curl.haxx.se/docs/httpscripting.html
>
> Curl will only grab a web page for you, it won't attempt to interpret the
> page. It won't even download images or script files unless you extract the
> relevant urls from any given page and perform subsequent fetches.
>
>
> So what tool does one use to evaluate the links, etc? How can I make
> decisions and such?
>
I developed my scripts on several Linux platforms, so I used grep, sed, awk
and similar command line tools to extract bits from the saved web pages and
check for specific key words. Some thing like this:
curl -o home.html http://some.server.com
X=`grep -ic 'My Stuff' home.html`
if [ '$X" -ne "1" ]; then
# didn't find 'my stuff', something went wrong, bail out
exit
fi
# get the 'my stuff' link and hack off all the extra bits
LINK=`grep 'My Stuff' home.html | sed -e .........`
lather, rinse, repeat.
At each step you will have to examine the saved page and see how to extract
what you want to get.
Ralph Mitchell
-------------------------------------------------------------------
List admin: http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2009-04-19