Buy commercial curl support. We
help you work out your issues, debug your libcurl applications, use the API,
port to new platforms, add new features and more. With a team lead by the
curl founder Daniel himself.
Re: curl getting different html
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: toby via curl-users <curl-users_at_lists.haxx.se>
Date: Wed, 8 Jan 2025 10:59:15 -0700
hi
the page is actually gzipped - dillo can gunzip but curl can't by itself
(found out using w3m)
On Wed, 8 Jan 2025 16:45:12 +0100
Hans Henrik Bergan <divinity76+curl_at_gmail.com> wrote:
> The website is using the Cloudflare "Bot Fight Mode" thing which is
> "protecting the website against bots",
> dillo, w3c, and curl are all triggering the "are you a human?"
> challenge page, and none of them are capable of passing it.
>
> It is strange that your dillo is not triggering it, probably something
> to do with your Dillo's IP.
>
> Anyway, your best bet to automate anything on that page is with
> "headless chrome running in headless mode" - idk how CF is detecting
> it, but it detects headless chromium running headless as bots, but it
> does not detect headless-chromium-running-in-headful-mode as bots.
>
> On Wed, 8 Jan 2025 at 16:23, toby via curl-users
> <curl-users_at_lists.haxx.se> wrote:
> >
> > Maybe someone can help me with this
> >
> > dillo https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent
> > gives a good page
> >
> > curl -b cookies.txt -c cookies.txt -A "Mozilla/5.0" -k -L -o crypto.html https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent
> > results in a page (crypto.html) saying it needs javascript and cookies and the cookies.txt file is 'empty'
> >
> > dillo doesn't do cookies or javascript either
> > w3m gives a good page - it does have
> > --
> > Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
> > Etiquette: https://curl.se/mail/etiquette.html
Date: Wed, 8 Jan 2025 10:59:15 -0700
hi
the page is actually gzipped - dillo can gunzip but curl can't by itself
(found out using w3m)
On Wed, 8 Jan 2025 16:45:12 +0100
Hans Henrik Bergan <divinity76+curl_at_gmail.com> wrote:
> The website is using the Cloudflare "Bot Fight Mode" thing which is
> "protecting the website against bots",
> dillo, w3c, and curl are all triggering the "are you a human?"
> challenge page, and none of them are capable of passing it.
>
> It is strange that your dillo is not triggering it, probably something
> to do with your Dillo's IP.
>
> Anyway, your best bet to automate anything on that page is with
> "headless chrome running in headless mode" - idk how CF is detecting
> it, but it detects headless chromium running headless as bots, but it
> does not detect headless-chromium-running-in-headful-mode as bots.
>
> On Wed, 8 Jan 2025 at 16:23, toby via curl-users
> <curl-users_at_lists.haxx.se> wrote:
> >
> > Maybe someone can help me with this
> >
> > dillo https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent
> > gives a good page
> >
> > curl -b cookies.txt -c cookies.txt -A "Mozilla/5.0" -k -L -o crypto.html https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent
> > results in a page (crypto.html) saying it needs javascript and cookies and the cookies.txt file is 'empty'
> >
> > dillo doesn't do cookies or javascript either
> > w3m gives a good page - it does have
> > --
> > Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
> > Etiquette: https://curl.se/mail/etiquette.html
-- Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users Etiquette: https://curl.se/mail/etiquette.htmlReceived on 2025-01-08