curl-library
[OT] Simple HTML parser needed
From: Daniel Haude <dunno_at_stoptrick.com>
Date: Mon, 31 Jan 2005 14:52:16 +0100
Date: Mon, 31 Jan 2005 14:52:16 +0100
I know this is off-topic in this list, but since I believe that many
people use libcurl to retrieve HTML data from which they seek to extract
information I suspect that there will be some knowledge here on how to
parse HTML.
I'm not looking for a full-fledged DOM parser, just something that
produces a "flat" stream of tags with attributes and normal text.
I can roll my own but I'd like to know if there's some "industry
standard" thing for this. I know plenty of XML parsers, but none of them
seems to like digesting the typical broken HTML found on many web pages.
Thanks,
**Daniel
Received on 2005-01-31