curl / Mailing Lists / curl-users / Single Mail
Buy commercial curl support. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder Daniel himself.

Re: curl getting different html

From: toby via curl-users <curl-users_at_lists.haxx.se>
Date: Wed, 8 Jan 2025 10:59:15 -0700

hi

the page is actually gzipped - dillo can gunzip but curl can't by itself
(found out using w3m)


On Wed, 8 Jan 2025 16:45:12 +0100
Hans Henrik Bergan <divinity76+curl_at_gmail.com> wrote:

> The website is using the Cloudflare "Bot Fight Mode" thing which is
> "protecting the website against bots",
> dillo, w3c, and curl are all triggering the "are you a human?"
> challenge page, and none of them are capable of passing it.
>
> It is strange that your dillo is not triggering it, probably something
> to do with your Dillo's IP.
>
> Anyway, your best bet to automate anything on that page is with
> "headless chrome running in headless mode" - idk how CF is detecting
> it, but it detects headless chromium running headless as bots, but it
> does not detect headless-chromium-running-in-headful-mode as bots.
>
> On Wed, 8 Jan 2025 at 16:23, toby via curl-users
> <curl-users_at_lists.haxx.se> wrote:
> >
> > Maybe someone can help me with this
> >
> > dillo https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent
> > gives a good page
> >
> > curl -b cookies.txt -c cookies.txt -A "Mozilla/5.0" -k -L -o crypto.html https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent
> > results in a page (crypto.html) saying it needs javascript and cookies and the cookies.txt file is 'empty'
> >
> > dillo doesn't do cookies or javascript either
> > w3m gives a good page - it does have
> > --
> > Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
> > Etiquette: https://curl.se/mail/etiquette.html
-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
Etiquette:   https://curl.se/mail/etiquette.html
Received on 2025-01-08