cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Parse html code

From: codemastr <codemstr_at_ptd.net>
Date: Tue, 4 May 2004 17:17:04 -0400

Simple.c won't help you, that just prints the file to the screen. You want
to read the file into memory, so look at
http://curl.haxx.se/lxr/source/docs/examples/getinmemory.c

Using that code, the HTML file would be in chunk.memory. You don't have to
convert HTML to a char *, because it already is a character string! So you
would simply parse it yourself, or use another library to parse it.

libcurl provides *no* routines for parsing HTML. It's designed to transfer
the files, not to interpret them. Libtidy would probably provide the HTML
parsing you need. You can find it on http://tidy.sourceforge.net I've never
used it myself, but from what I understand, it does HTML parsing.

You can find a reference to libtidy, and other libraries that might be
useful with libcurl at http://curl.haxx.se/libcurl/relatedlibs.html

Hope that helps!

Dominick Meglio

----- Original Message -----
From: <jan.leynen_at_student.luc.ac.be>
To: <curl-library_at_cool.haxx.se>
Sent: Tuesday, May 04, 2004 12:41 PM
Subject: Parse html code

>
> Hi,
> i was looking to "http://curl.haxx.se/lxr/source/docs/examples/simple.c"
>
> now my questions are :
> - where can i find the code of the html page? I suppose in the CURL
*pointer,
> but is this CURL type a struct with a char pointer included or
> what is it??
> - i need the htmlcode to find all the anchors in it?Does there exist
already
> in
> Curl a function which does this ? Or can i convert the html code to a
char
> pointer and parse it with my Cprogram??
>
> - Sow the most important thing i need to know is how to get the html
code and
>
> how i can use it in my C program ???
>
> Can somebody help me please :o) .
> Thaaaaaaaannnnnnnks a lot.
> Jan
>
>
Received on 2004-05-04