Buy commercial curl support. We
help you work out your issues, debug your libcurl applications, use the API,
port to new platforms, add new features and more. With a team lead by the
curl founder Daniel himself.
Re: curl getting different html
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: Hans Henrik Bergan via curl-users <curl-users_at_lists.haxx.se>
Date: Wed, 8 Jan 2025 16:45:12 +0100
The website is using the Cloudflare "Bot Fight Mode" thing which is
"protecting the website against bots",
dillo, w3c, and curl are all triggering the "are you a human?"
challenge page, and none of them are capable of passing it.
It is strange that your dillo is not triggering it, probably something
to do with your Dillo's IP.
Anyway, your best bet to automate anything on that page is with
"headless chrome running in headless mode" - idk how CF is detecting
it, but it detects headless chromium running headless as bots, but it
does not detect headless-chromium-running-in-headful-mode as bots.
On Wed, 8 Jan 2025 at 16:23, toby via curl-users
<curl-users_at_lists.haxx.se> wrote:
>
> Maybe someone can help me with this
>
> dillo https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent
> gives a good page
>
> curl -b cookies.txt -c cookies.txt -A "Mozilla/5.0" -k -L -o crypto.html https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent
> results in a page (crypto.html) saying it needs javascript and cookies and the cookies.txt file is 'empty'
>
> dillo doesn't do cookies or javascript either
> w3m gives a good page - it does have
> --
> Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
> Etiquette: https://curl.se/mail/etiquette.html
Date: Wed, 8 Jan 2025 16:45:12 +0100
The website is using the Cloudflare "Bot Fight Mode" thing which is
"protecting the website against bots",
dillo, w3c, and curl are all triggering the "are you a human?"
challenge page, and none of them are capable of passing it.
It is strange that your dillo is not triggering it, probably something
to do with your Dillo's IP.
Anyway, your best bet to automate anything on that page is with
"headless chrome running in headless mode" - idk how CF is detecting
it, but it detects headless chromium running headless as bots, but it
does not detect headless-chromium-running-in-headful-mode as bots.
On Wed, 8 Jan 2025 at 16:23, toby via curl-users
<curl-users_at_lists.haxx.se> wrote:
>
> Maybe someone can help me with this
>
> dillo https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent
> gives a good page
>
> curl -b cookies.txt -c cookies.txt -A "Mozilla/5.0" -k -L -o crypto.html https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent
> results in a page (crypto.html) saying it needs javascript and cookies and the cookies.txt file is 'empty'
>
> dillo doesn't do cookies or javascript either
> w3m gives a good page - it does have
> --
> Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
> Etiquette: https://curl.se/mail/etiquette.html
-- Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users Etiquette: https://curl.se/mail/etiquette.htmlReceived on 2025-01-08