curl / Mailing Lists / curl-users / Single Mail
Buy commercial curl support. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder Daniel himself.

Re: curl getting different html

From: toby via curl-users <curl-users_at_lists.haxx.se>
Date: Wed, 8 Jan 2025 10:41:40 -0700

On Wed, 8 Jan 2025 16:45:12 +0100
Hans Henrik Bergan <divinity76+curl_at_gmail.com> wrote:

thanks Henrik,


> The website is using the Cloudflare "Bot Fight Mode" thing which is
> "protecting the website against bots",
> dillo, w3c, and curl are all triggering the "are you a human?"


w3c -> w3m

i don;t get the 'are you a human' page with any of them

> challenge page, and none of them are capable of passing it.
>
> It is strange that your dillo is not triggering it, probably something
> to do with your Dillo's IP.

IP meaning -> internet facing address and not intellectual property?

w3m also gets the page
all three are tryihg to get page from same internet facing address

btw how did you find out about cloudfare running the 'bot fight mode'?
what is it 'checking' for ? the dillo doesn't have js or cookies ?? and all use the same user-agent

until this is figured out - i can use w3m to save the page 'automatically' in place of curl

>
> Anyway, your best bet to automate anything on that page is with
> "headless chrome running in headless mode" - idk how CF is detecting
> it, but it detects headless chromium running headless as bots, but it
> does not detect headless-chromium-running-in-headful-mode as bots.
>
> On Wed, 8 Jan 2025 at 16:23, toby via curl-users
> <curl-users_at_lists.haxx.se> wrote:
> >
> > Maybe someone can help me with this
> >
> > dillo https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent
> > gives a good page
> >
> > curl -b cookies.txt -c cookies.txt -A "Mozilla/5.0" -k -L -o crypto.html https://www.podchaser.com/podcasts/crypto-corner-bitcoin-and-bloc-950963/episodes/recent
> > results in a page (crypto.html) saying it needs javascript and cookies and the cookies.txt file is 'empty'
> >
> > dillo doesn't do cookies or javascript either
> > w3m gives a good page - it does have
> > --
> > Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
> > Etiquette: https://curl.se/mail/etiquette.html
-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
Etiquette:   https://curl.se/mail/etiquette.html
Received on 2025-01-08