curl / Mailing Lists / curl-users / Single Mail
Buy commercial curl support. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder Daniel himself.

Re: curl getting different html

From: toby via curl-users <curl-users_at_lists.haxx.se>
Date: Fri, 10 Jan 2025 08:11:32 -0700

On Wed, 8 Jan 2025 22:09:41 +0100
Hans Henrik Bergan via curl-users <curl-users_at_lists.haxx.se> wrote:

> >i don;t get the 'are you a human' page with any of them
>
> lucky you: cloudflare's automated systems deems it unnecessary to
> challenge you. A variation of a "trustworthy IP range" and
> "correct-looking headers"
> and "an SSL/TLS implementation behaving like a web browser" cause
> people to not get challenged at all (see
> https://github.com/lwthiker/curl-impersonate for more info on the
> ssl/tls part)

I've had problems with cloudfare sites with dillo and w3m (but just not podchaser) - just telling me i'm a bot and to go away
they don't seem very consistent

>
> >IP meaning -> internet facing address and not intellectual property?
>
> yes internet facing address.
>
> >btw how did you find out about cloudfare running the 'bot fight mode'?
>
> It's behaving similarly to cloudflare pages in bot fight mode.
> It's running in some kind of white-label mode making cloudflare
> non-trivial to recognize,
> but I still recognize the javascript from the cloudflare challenge
> page, look at this javascript:
> - window._cf_chl_opt={cvId: '3',cZone: "www.podchaser.com",cType:
> 'managed',cRay: '8fef0e7b89c8b509'
>
> the `_cf_chl_opt` is a cloudflare challenge page variable name, `cRay`
> is a cloudflare id, I also recognize the title tag
> <title>Just a moment...</title>
> from other cloudflare-protected sites.
> It's definitely cloudflare. (cloudflare-in-hiding! never seen that before)
>
> >what is it 'checking' for ?
>
> a quick automated check that you're "probably not a robot" before
> giving you access to the real website.
> Stop robots but not humans from accessing the page, is the intention, probably.
>
> > the page is actually gzipped - dillo can gunzip but curl can't by itself
> > > (found out using w3m)
>
> it's only gzipped if you give the appropriate request header, eg
> `Accept-Encoding: gzip, deflate`

can curl give me a gzipped page if i use the correct header? (hint hint - can you tell me what to use for curl)
it just gives me the 'you need javascript and cookies' page
-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
Etiquette:   https://curl.se/mail/etiquette.html
Received on 2025-01-10