curl / Mailing Lists / curl-users / Single Mail

Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Re: I need help getting a web page

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Hans Henrik Bergan via curl-users <curl-users_at_lists.haxx.se>
Date: Tue, 12 Oct 2021 12:04:28 +0200

try digging this company name out of the HTML:
<span>AT&T</span>

the correct translation, as a proper HTML parser will get you: AT&T
what a regex extraction will get you: AT&T
try digging the title out of this link:
<a href="foo" title="5>3"> Mathematical proof that 5 is greater than 3! </a>

a regex extraction is very likely to fail here, and extract 3">
Mathematical(...)
while a proper HTML parser will have no problem, and correctly parse out
"Mathematical proof that 5 is greater than 3!"

but it's only broken code, not life and death.

On Tue, 12 Oct 2021 at 11:16, ToddAndMargo via curl-users <
curl-users_at_lists.haxx.se> wrote:

> On 10/12/21 00:02, Hans Henrik Bergan via curl-users wrote:
> > https://stackoverflow.com/a/1732454/1067003
>
> "You can't parse [X]HTML with regex. Because HTML
> can't be parsed by regex. Regex is not a tool that
> can be used to correctly parse HTML"
>
> Just watch me! I dig things out of html code all
> the time. Probably not "parsing" though. Raku's
> regex eats html alive!
>
> --
> Unsubscribe: https://lists.haxx.se/listinfo/curl-users
> Etiquette: https://curl.haxx.se/mail/etiquette.html
>

-- 
Unsubscribe: https://lists.haxx.se/listinfo/curl-users
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Received on 2021-10-12

This message: [ Message body ]
Next message: ToddAndMargo via curl-users: "Re: I need help getting a web page"
Previous message: ToddAndMargo via curl-users: "Re: I need help getting a web page"
In reply to: ToddAndMargo via curl-users: "Re: I need help getting a web page"
Next in thread: ToddAndMargo via curl-users: "Re: I need help getting a web page"
Reply: ToddAndMargo via curl-users: "Re: I need help getting a web page"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]