Buy commercial curl support from WolfSSL. We help you work
out your issues, debug your libcurl applications, use the API, port to new
platforms, add new features and more. With a team lead by the curl founder
Re: I need help getting a web page
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: Hans Henrik Bergan via curl-users <>
Date: Tue, 12 Oct 2021 15:51:35 +0200
> I'd probably do a
$ raku
Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2021.07.
Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
Built on MoarVM version 2021.07.
To exit type 'exit' or '^D'
> my $x = Q[AT&T];
also keep in mind that " must be translated to " and ' must be
translated to ' and < must be translated to < and > must be
translated to > and must be translated to and ¡ must be
translated to ¡ and ¢ must be translated to ¢ and £ must be
translated to £ and ¤ must be translated to ¤ and ¥ must be
translated to ¥ and ¦ must be translated to ¦ and § must be
translated to § and ¨ must be translated to ¨ and © must be
translated to © and ª must be translated to ª and « must be
translated to « and ¬ must be translated to ¬ and ­ must be
translated to and ® must be translated to ® and ¯ must be
translated to ¯ and ° must be translated to ° and ± must be
translated to ± and ² must be translated to ² and ³ must be
translated to ³ and ´ must be translated to ´ and µ must be
translated to µ and ¶ must be translated to ¶ and · must be
translated to · and ¸ must be translated to ¸ and ¹ must be
translated to ¹ and º must be translated to º and » must be
translated to » and ¼ must be translated to ¼ and ½ must be
translated to ½ and ¾ must be translated to ¾ and ¿ must be
translated to ¿ and À must be translated to À and Á must be
translated to Á and  must be translated to  and à must be
translated to à and Ä must be translated to Ä and Å must be
translated to Å and Æ must be translated to Æ and Ç must be
translated to Ç and È must be translated to È and É must be
translated to É and Ê must be translated to Ê and Ë must be
translated to Ë and Ì must be translated to Ì and Í must be
translated to Í and Î must be translated to Î and Ï must be
translated to Ï and Ð must be translated to Ð and Ñ must be
translated to Ñ and Ò must be translated to Ò and Ó must be
translated to Ó and Ô must be translated to Ô and Õ must be
translated to Õ and Ö must be translated to Ö and × must be
translated to × and Ø must be translated to Ø and Ù must be
translated to Ù and Ú must be translated to Ú and Û must be
translated to Û and Ü must be translated to Ü and Ý must be
translated to Ý and Þ must be translated to Þ and ß must be
translated to ß and à must be translated to à and á must be
translated to á and â must be translated to â and ã must be
translated to ã and ä must be translated to ä and å must be
translated to å and æ must be translated to æ and ç must be
translated to ç and è must be translated to è and é must be
translated to é and ê must be translated to ê and ë must be
translated to ë and ì must be translated to ì and í must be
translated to í and î must be translated to î and ï must be
translated to ï and ð must be translated to ð and ñ must be
translated to ñ and ò must be translated to ò and ó must be
translated to ó and ô must be translated to ô and õ must be
translated to õ and ö must be translated to ö and ÷ must be
translated to ÷ and ø must be translated to ø and ù must be
translated to ù and ú must be translated to ú and û must be
translated to û and ü must be translated to ü and ý must be
translated to ý and þ must be translated to þ and ÿ must be
translated to ÿ and Œ must be translated to Œ and œ must be
translated to œ and Š must be translated to Š and š must be
translated to š and Ÿ must be translated to Ÿ and ƒ must be
translated to ƒ and ˆ must be translated to ˆ and ˜ must be
translated to ˜ and Α must be translated to Α and Β must be
translated to Β and Γ must be translated to Γ and Δ must be
translated to Δ and Ε must be translated to Ε and Ζ must be
translated to Ζ and Η must be translated to Η and Θ must be
translated to Θ and Ι must be translated to Ι and Κ must be
translated to Κ and Λ must be translated to Λ and Μ must be
translated to Μ and Ν must be translated to Ν and Ξ must be
translated to Ξ and Ο must be translated to Ο and Π must be
translated to Π and Ρ must be translated to Ρ and Σ must be
translated to Σ and Τ must be translated to Τ and Υ must be
translated to Υ and Φ must be translated to Φ and Χ must be
translated to Χ and Ψ must be translated to Ψ and Ω must be
translated to Ω and α must be translated to α and β must be
translated to β and γ must be translated to γ and δ must be
translated to δ and ε must be translated to ε and ζ must be
translated to ζ and η must be translated to η and θ must be
translated to θ and ι must be translated to ι and κ must be
translated to κ and λ must be translated to λ and μ must be
translated to μ and ν must be translated to ν and ξ must be
translated to ξ and ο must be translated to ο and π must be
translated to π and ρ must be translated to ρ and ς must be
translated to ς and σ must be translated to σ and τ must be
translated to τ and υ must be translated to υ and φ must be
translated to φ and χ must be translated to χ and ψ must be
translated to ψ and ω must be translated to ω and ϑ must be
translated to ϑ and ϒ must be translated to ϒ and ϖ must be
translated to ϖ and   must be translated to and   must be
translated to and   must be translated to and ‌ must be
translated to and ‍ must be translated to and ‎ must be
translated to and ‏ must be translated to and – must be
translated to – and — must be translated to — and ‘ must be
translated to ‘ and ’ must be translated to ’ and ‚ must be
translated to ‚ and “ must be translated to “ and ” must be
translated to ” and „ must be translated to „ and † must be
translated to † and ‡ must be translated to ‡ and • must be
translated to • and … must be translated to … and ‰ must be
translated to ‰ and ′ must be translated to ′ and ″ must be
translated to ″ and ‹ must be translated to ‹ and › must be
translated to › and ‾ must be translated to ‾ and ⁄ must be
translated to ⁄ and € must be translated to € and ℑ must be
translated to ℑ and ℘ must be translated to ℘ and ℜ must be
translated to ℜ and ™ must be translated to ™ and ℵ must be
translated to ℵ and ← must be translated to ← and ↑ must be
translated to ↑ and → must be translated to → and ↓ must be
translated to ↓ and ↔ must be translated to ↔ and ↵ must be
translated to ↵ and ⇐ must be translated to ⇐ and ⇑ must be
translated to ⇑ and ⇒ must be translated to ⇒ and ⇓ must be
translated to ⇓ and ⇔ must be translated to ⇔ and ∀ must be
translated to ∀ and ∂ must be translated to ∂ and ∃ must be
translated to ∃ and ∅ must be translated to ∅ and ∇ must be
translated to ∇ and ∈ must be translated to ∈ and ∉ must be
translated to ∉ and ∋ must be translated to ∋ and ∏ must be
translated to ∏ and ∑ must be translated to ∑ and − must be
translated to − and ∗ must be translated to ∗ and √ must be
translated to √ and ∝ must be translated to ∝ and ∞ must be
translated to ∞ and ∠ must be translated to ∠ and ∧ must be
translated to ∧ and ∨ must be translated to ∨ and ∩ must be
translated to ∩ and ∪ must be translated to ∪ and ∫ must be
translated to ∫ and ∴ must be translated to ∴ and ∼ must be
translated to ∼ and ≅ must be translated to ≅ and ≈ must be
translated to ≈ and ≠ must be translated to ≠ and ≡ must be
translated to ≡ and ≤ must be translated to ≤ and ≥ must be
translated to ≥ and ⊂ must be translated to ⊂ and ⊃ must be
translated to ⊃ and ⊄ must be translated to ⊄ and ⊆ must be
translated to ⊆ and ⊇ must be translated to ⊇ and ⊕ must be
translated to ⊕ and ⊗ must be translated to ⊗ and ⊥ must be
translated to ⊥ and ⋅ must be translated to ⋅ and ⌈ must be
translated to ⌈ and ⌉ must be translated to ⌉ and ⌊ must be
translated to ⌊ and ⌋ must be translated to ⌋ and ⟨ must be
translated to 〈 and ⟩ must be translated to 〉 and ◊ must be
translated to ◊ and ♠ must be translated to ♠ and ♣ must be
translated to ♣ and ♥ must be translated to ♥ and ♦ must be
translated to ♦
On Tue, 12 Oct 2021 at 13:17, ToddAndMargo via curl-users <> wrote:
> On 10/12/21 03:04, Hans Henrik Bergan via curl-users wrote:
> > ry digging this company name out of the HTML:
> > <span>AT&T</span>
> >
> > the correct translation, as a proper HTML parser will get you: AT&T
> > what a regex extraction will get you: AT&T
> > try digging the title out of this link:
> > <a href="foo" title="5>3"> Mathematical proof that 5 is greater than 3!
> </a>
> >
> > a regex extraction is very likely to fail here, and extract 3">
> > Mathematical(...)
> > while a proper HTML parser will have no problem, and correctly parse out
> > "Mathematical proof that 5 is greater than 3!"
> >
> > but it's only broken code, not life and death.
> I am basically looking for links and revisions.
> But if I had to deal with
> <body>
> AT&T
> </body>
> I'd probably do a
> $ raku
> Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2021.07.
> Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
> Built on MoarVM version 2021.07.
> To exit type 'exit' or '^D'
> > my $x = Q[AT&T];
> AT&T
> > $x~~s/ ('AT&T') /AT&T/;
> 「AT&T」
> 0 => 「AT&T」
> > say $x
> AT&T
> Revisions and link never have odd characters in them.
> It is far easier for me to just go straight to the
> code itself than trying translating it to text. Keep
> in mind that I know the pattern I am looking for and
> the rest of the page is just noise to be discarded.
> My biggest difficultly is having to go into
> hexedit to find unprintable characters, but I
> have gotten pretty good at figuring out when
> that is happening and working around them. This
> usually happens when a web designer mixes UTF-8
> and UTF-16 together by accident. I am in UTF-8.
> --
> Unsubscribe:
> Etiquette:
Date: Tue, 12 Oct 2021 15:51:35 +0200
> I'd probably do a
$ raku
Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2021.07.
Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
Built on MoarVM version 2021.07.
To exit type 'exit' or '^D'
> my $x = Q[AT&T];
also keep in mind that " must be translated to " and ' must be
translated to ' and < must be translated to < and > must be
translated to > and must be translated to and ¡ must be
translated to ¡ and ¢ must be translated to ¢ and £ must be
translated to £ and ¤ must be translated to ¤ and ¥ must be
translated to ¥ and ¦ must be translated to ¦ and § must be
translated to § and ¨ must be translated to ¨ and © must be
translated to © and ª must be translated to ª and « must be
translated to « and ¬ must be translated to ¬ and ­ must be
translated to and ® must be translated to ® and ¯ must be
translated to ¯ and ° must be translated to ° and ± must be
translated to ± and ² must be translated to ² and ³ must be
translated to ³ and ´ must be translated to ´ and µ must be
translated to µ and ¶ must be translated to ¶ and · must be
translated to · and ¸ must be translated to ¸ and ¹ must be
translated to ¹ and º must be translated to º and » must be
translated to » and ¼ must be translated to ¼ and ½ must be
translated to ½ and ¾ must be translated to ¾ and ¿ must be
translated to ¿ and À must be translated to À and Á must be
translated to Á and  must be translated to  and à must be
translated to à and Ä must be translated to Ä and Å must be
translated to Å and Æ must be translated to Æ and Ç must be
translated to Ç and È must be translated to È and É must be
translated to É and Ê must be translated to Ê and Ë must be
translated to Ë and Ì must be translated to Ì and Í must be
translated to Í and Î must be translated to Î and Ï must be
translated to Ï and Ð must be translated to Ð and Ñ must be
translated to Ñ and Ò must be translated to Ò and Ó must be
translated to Ó and Ô must be translated to Ô and Õ must be
translated to Õ and Ö must be translated to Ö and × must be
translated to × and Ø must be translated to Ø and Ù must be
translated to Ù and Ú must be translated to Ú and Û must be
translated to Û and Ü must be translated to Ü and Ý must be
translated to Ý and Þ must be translated to Þ and ß must be
translated to ß and à must be translated to à and á must be
translated to á and â must be translated to â and ã must be
translated to ã and ä must be translated to ä and å must be
translated to å and æ must be translated to æ and ç must be
translated to ç and è must be translated to è and é must be
translated to é and ê must be translated to ê and ë must be
translated to ë and ì must be translated to ì and í must be
translated to í and î must be translated to î and ï must be
translated to ï and ð must be translated to ð and ñ must be
translated to ñ and ò must be translated to ò and ó must be
translated to ó and ô must be translated to ô and õ must be
translated to õ and ö must be translated to ö and ÷ must be
translated to ÷ and ø must be translated to ø and ù must be
translated to ù and ú must be translated to ú and û must be
translated to û and ü must be translated to ü and ý must be
translated to ý and þ must be translated to þ and ÿ must be
translated to ÿ and Œ must be translated to Œ and œ must be
translated to œ and Š must be translated to Š and š must be
translated to š and Ÿ must be translated to Ÿ and ƒ must be
translated to ƒ and ˆ must be translated to ˆ and ˜ must be
translated to ˜ and Α must be translated to Α and Β must be
translated to Β and Γ must be translated to Γ and Δ must be
translated to Δ and Ε must be translated to Ε and Ζ must be
translated to Ζ and Η must be translated to Η and Θ must be
translated to Θ and Ι must be translated to Ι and Κ must be
translated to Κ and Λ must be translated to Λ and Μ must be
translated to Μ and Ν must be translated to Ν and Ξ must be
translated to Ξ and Ο must be translated to Ο and Π must be
translated to Π and Ρ must be translated to Ρ and Σ must be
translated to Σ and Τ must be translated to Τ and Υ must be
translated to Υ and Φ must be translated to Φ and Χ must be
translated to Χ and Ψ must be translated to Ψ and Ω must be
translated to Ω and α must be translated to α and β must be
translated to β and γ must be translated to γ and δ must be
translated to δ and ε must be translated to ε and ζ must be
translated to ζ and η must be translated to η and θ must be
translated to θ and ι must be translated to ι and κ must be
translated to κ and λ must be translated to λ and μ must be
translated to μ and ν must be translated to ν and ξ must be
translated to ξ and ο must be translated to ο and π must be
translated to π and ρ must be translated to ρ and ς must be
translated to ς and σ must be translated to σ and τ must be
translated to τ and υ must be translated to υ and φ must be
translated to φ and χ must be translated to χ and ψ must be
translated to ψ and ω must be translated to ω and ϑ must be
translated to ϑ and ϒ must be translated to ϒ and ϖ must be
translated to ϖ and   must be translated to and   must be
translated to and   must be translated to and ‌ must be
translated to and ‍ must be translated to and ‎ must be
translated to and ‏ must be translated to and – must be
translated to – and — must be translated to — and ‘ must be
translated to ‘ and ’ must be translated to ’ and ‚ must be
translated to ‚ and “ must be translated to “ and ” must be
translated to ” and „ must be translated to „ and † must be
translated to † and ‡ must be translated to ‡ and • must be
translated to • and … must be translated to … and ‰ must be
translated to ‰ and ′ must be translated to ′ and ″ must be
translated to ″ and ‹ must be translated to ‹ and › must be
translated to › and ‾ must be translated to ‾ and ⁄ must be
translated to ⁄ and € must be translated to € and ℑ must be
translated to ℑ and ℘ must be translated to ℘ and ℜ must be
translated to ℜ and ™ must be translated to ™ and ℵ must be
translated to ℵ and ← must be translated to ← and ↑ must be
translated to ↑ and → must be translated to → and ↓ must be
translated to ↓ and ↔ must be translated to ↔ and ↵ must be
translated to ↵ and ⇐ must be translated to ⇐ and ⇑ must be
translated to ⇑ and ⇒ must be translated to ⇒ and ⇓ must be
translated to ⇓ and ⇔ must be translated to ⇔ and ∀ must be
translated to ∀ and ∂ must be translated to ∂ and ∃ must be
translated to ∃ and ∅ must be translated to ∅ and ∇ must be
translated to ∇ and ∈ must be translated to ∈ and ∉ must be
translated to ∉ and ∋ must be translated to ∋ and ∏ must be
translated to ∏ and ∑ must be translated to ∑ and − must be
translated to − and ∗ must be translated to ∗ and √ must be
translated to √ and ∝ must be translated to ∝ and ∞ must be
translated to ∞ and ∠ must be translated to ∠ and ∧ must be
translated to ∧ and ∨ must be translated to ∨ and ∩ must be
translated to ∩ and ∪ must be translated to ∪ and ∫ must be
translated to ∫ and ∴ must be translated to ∴ and ∼ must be
translated to ∼ and ≅ must be translated to ≅ and ≈ must be
translated to ≈ and ≠ must be translated to ≠ and ≡ must be
translated to ≡ and ≤ must be translated to ≤ and ≥ must be
translated to ≥ and ⊂ must be translated to ⊂ and ⊃ must be
translated to ⊃ and ⊄ must be translated to ⊄ and ⊆ must be
translated to ⊆ and ⊇ must be translated to ⊇ and ⊕ must be
translated to ⊕ and ⊗ must be translated to ⊗ and ⊥ must be
translated to ⊥ and ⋅ must be translated to ⋅ and ⌈ must be
translated to ⌈ and ⌉ must be translated to ⌉ and ⌊ must be
translated to ⌊ and ⌋ must be translated to ⌋ and ⟨ must be
translated to 〈 and ⟩ must be translated to 〉 and ◊ must be
translated to ◊ and ♠ must be translated to ♠ and ♣ must be
translated to ♣ and ♥ must be translated to ♥ and ♦ must be
translated to ♦
On Tue, 12 Oct 2021 at 13:17, ToddAndMargo via curl-users <> wrote:
> On 10/12/21 03:04, Hans Henrik Bergan via curl-users wrote:
> > ry digging this company name out of the HTML:
> > <span>AT&T</span>
> >
> > the correct translation, as a proper HTML parser will get you: AT&T
> > what a regex extraction will get you: AT&T
> > try digging the title out of this link:
> > <a href="foo" title="5>3"> Mathematical proof that 5 is greater than 3!
> </a>
> >
> > a regex extraction is very likely to fail here, and extract 3">
> > Mathematical(...)
> > while a proper HTML parser will have no problem, and correctly parse out
> > "Mathematical proof that 5 is greater than 3!"
> >
> > but it's only broken code, not life and death.
> I am basically looking for links and revisions.
> But if I had to deal with
> <body>
> AT&T
> </body>
> I'd probably do a
> $ raku
> Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2021.07.
> Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
> Built on MoarVM version 2021.07.
> To exit type 'exit' or '^D'
> > my $x = Q[AT&T];
> AT&T
> > $x~~s/ ('AT&T') /AT&T/;
> 「AT&T」
> 0 => 「AT&T」
> > say $x
> AT&T
> Revisions and link never have odd characters in them.
> It is far easier for me to just go straight to the
> code itself than trying translating it to text. Keep
> in mind that I know the pattern I am looking for and
> the rest of the page is just noise to be discarded.
> My biggest difficultly is having to go into
> hexedit to find unprintable characters, but I
> have gotten pretty good at figuring out when
> that is happening and working around them. This
> usually happens when a web designer mixes UTF-8
> and UTF-16 together by accident. I am in UTF-8.
> --
> Unsubscribe:
> Etiquette:
-- Unsubscribe: Etiquette: on 2021-10-12