cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Curl speed / timeout options

From: sds ssdsd <fwprc13_at_yahoo.com>
Date: Mon, 17 Oct 2005 15:34:39 +0100 (BST)

--- Magnus Stålnacke <jemamo_at_telia.com> wrote:

> sds ssdsd wrote:
> >
> > 2) My second question is 'curl oriented'...: In
> shell
> > scripts, when i curl some web pages, i always
> resort
> > to "sed" or "grep" to parse them and do the stuff.
> > This is not a very neat solution. I would like to
> be
> > able to perform xml "xpath request" on html pages
> but:
> > I tried xml parsers, which failed because html
> pages
> > are not well formed xml documents (parsers find
> lots
> > of errors, even on very simple web pages)
> > Do anyone is aware of such a tool ?
> >
>
> Maybe "lynx -dump" piped to some sed/grep regexp.
> can
> do the trick for you?
>
> I think you have to be a little more specific about
> your "problem", is it content or code you want to do
> your stuff on?
>

Ok I'll try to explain a little deeper. I mainly use
curl to emulate user cliks and surf through certain
web pages in batches. To be able to do that, you have,
as long as you surf, to parse the actual contents of
the page you're in, in order to fetch some crucial
values (the names, the values of the controls in
forms, some http dynamic links etc...). Curl do a
great work in fetching pages and all the submit &
cookies stuff, but i need some tool that helps to
parse the actual content. You see that lynx -dump is
not an option because you completely loose the
document structure, keeping just the "visual aspect".
Thanks for reading.
Jerome

Send instant messages to your online friends http://uk.messenger.yahoo.com
Received on 2005-10-17