curl-library
Re: Save as text, lynx -dump
From: Dan Fandrich <dan_at_coneharvesters.com>
Date: Wed, 7 May 2003 21:20:59 -0700
Date: Wed, 7 May 2003 21:20:59 -0700
On Thu, May 08, 2003 at 12:39:00PM +1000, James Wettenhall wrote:
> Is there a way to immitate lynx -dump using libcurl?
> i.e. I want to save a webpage as text using an API,
> rather than using a system call to "lynx".
Downloading the HTML (what libcurl does) is only a small part of converting
a web page to text. The hard part is parsing the HTML and rendering a page
that looks half decent. If you just want the raw text and don't care how it
looks (for indexing or something), then it's pretty easy to write a parser
that just throws out everything between < and >. Otherwise, you'll end up
rewriting most of lynx. I can't think of many situations where that would
be a win.
>>> Dan
-- http://www.MoveAnnouncer.com The web change of address service Let webmasters know that your web site has moved ------------------------------------------------------- Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara The only event dedicated to issues related to Linux enterprise solutions www.enterpriselinuxforum.comReceived on 2003-05-08