curl / Mailing Lists / curl-users / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Re: Strange redirection to Cloudflare server with Captcha?

From: Mac-Fly via curl-users <curl-users_at_cool.haxx.se>
Date: Sun, 16 May 2021 10:57:18 +0200

Dear Bruce et al.,

I am afraid your are right. I tried to follow the steps you've described bit it seems quite difficult to find a good work-around. :-(

What is strange though is: I've used a browser (Firefox) with a completely empty profile (so no cache, no cookies etc...) and I don't see the captcha. So although I am "faking" the user agent to be the same as for Firefox ("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0") there must be something more that Firefox is doing or Cloudfare is checking to avoid the captcha.

So far, I wasn't aware that provider do such a strange thing. but as you said:
> There's nothing that says a target site has to play nice!!

So - its at least not a cURL issue. That is, what I wanted to know.

Thanks a lot!

------------------------------

Date: Sat, 15 May 2021 00:24:54 -0400
From: bruce <badouglas_at_gmail.com>
To: the curl tool <curl-users_at_cool.haxx.se>
Subject: Re: Strange redirection to Cloudflare server with Captcha?
Message-ID:
        <CAP16ngrbPQcNgadVCkLzo3JHkjmPYfrRSB+i_CoU_CqcGfp1iA_at_mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

Hi.

Welcome to the obnoxious world of crawling. A bunch of sites that sign
up to CDN/solns like Cloudflare are implementing recaptcha/cookie
issues.

There's nothing that says a target site has to play nice!!

However, if you walk through the process, and if you inspect the
traffic, you can possibly determine steps around the issue.

I've discovered with various targets that implement captcha, that the
image/process is usually only happening at th begining of the crawl
process. This allows me to perform a manual process where I manually
"solve" the captcha, and analyze the traffic to see what it's doing.

In most cases, the captcha process is creating a "cookie" file of some sort.

I've been able to "capture" the cookie, and to insert this in my
curl/crawl process and things more or less manage to work.

In fact, In most processes, the cookie lasts for hours/days and in
some cases I can craft a process that works through the site using
this method.

Your mileage might vary.

Good Luck!



On Fri, May 14, 2021 at 11:26 PM Mac-Fly via curl-users
<curl-users_at_cool.haxx.se> wrote:
>
> Dear all,
>
> since a few days a have the problem that a few pages that used to work just fine now return a page that comes from Cloudflare and wants to input some Captcha.
>
> For example, a call to:
> curl https://www.audacityteam.org
> ...returns a webpage with the title:
> <title>Please Wait... | Cloudflare</title>
> ...and this meta-data:
> <meta name="captcha-bypass" id="captcha-bypass" />
> ...and of course more content.
> Strangely the same page called form any browser works just fine. And also, _was_ working fine for ages!
> Also I tried to set the user agent to be exactly as my browser with no effect.
>
> Do you have any idea what this may cause and (more important) how to work-around? :-)
>
> The cURL I am using is:
> curl 7.76.1 (i386-pc-win32) libcurl/7.76.1 OpenSSL/1.1.1k (Schannel) zlib/1.2.11 brotli/1.0.9 zstd/1.4.9 WinIDN libssh2/1.9.0 nghttp2/1.43.0 libgsasl/1.10.0
>
> -----------------------------------------------------------
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
> Etiquette: https://curl.haxx.se/mail/etiquette.html


-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2021-05-16