cURL / Mailing Lists / curl-library / Single Mail

curl-library

[OT] Simple HTML parser needed

From: Daniel Haude <dunno_at_stoptrick.com>
Date: Mon, 31 Jan 2005 14:52:16 +0100

I know this is off-topic in this list, but since I believe that many
people use libcurl to retrieve HTML data from which they seek to extract
information I suspect that there will be some knowledge here on how to
parse HTML.

I'm not looking for a full-fledged DOM parser, just something that
produces a "flat" stream of tags with attributes and normal text.

I can roll my own but I'd like to know if there's some "industry
standard" thing for this. I know plenty of XML parsers, but none of them
seems to like digesting the typical broken HTML found on many web pages.

Thanks,
**Daniel
Received on 2005-01-31