Buy commercial curl support from WolfSSL. We help you work
out your issues, debug your libcurl applications, use the API, port to new
platforms, add new features and more. With a team lead by the curl founder
himself.
Re: Strange redirection to Cloudflare server with Captcha?
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: bruce via curl-users <curl-users_at_cool.haxx.se>
Date: Sat, 15 May 2021 00:24:54 -0400
Hi.
Welcome to the obnoxious world of crawling. A bunch of sites that sign
up to CDN/solns like Cloudflare are implementing recaptcha/cookie
issues.
There's nothing that says a target site has to play nice!!
However, if you walk through the process, and if you inspect the
traffic, you can possibly determine steps around the issue.
I've discovered with various targets that implement captcha, that the
image/process is usually only happening at th begining of the crawl
process. This allows me to perform a manual process where I manually
"solve" the captcha, and analyze the traffic to see what it's doing.
In most cases, the captcha process is creating a "cookie" file of some sort.
I've been able to "capture" the cookie, and to insert this in my
curl/crawl process and things more or less manage to work.
In fact, In most processes, the cookie lasts for hours/days and in
some cases I can craft a process that works through the site using
this method.
Your mileage might vary.
Good Luck!
On Fri, May 14, 2021 at 11:26 PM Mac-Fly via curl-users
<curl-users_at_cool.haxx.se> wrote:
>
> Dear all,
>
> since a few days a have the problem that a few pages that used to work just fine now return a page that comes from Cloudflare and wants to input some Captcha.
>
> For example, a call to:
> curl https://www.audacityteam.org
> ...returns a webpage with the title:
> <title>Please Wait... | Cloudflare</title>
> ...and this meta-data:
> <meta name="captcha-bypass" id="captcha-bypass" />
> ...and of course more content.
> Strangely the same page called form any browser works just fine. And also, _was_ working fine for ages!
> Also I tried to set the user agent to be exactly as my browser with no effect.
>
> Do you have any idea what this may cause and (more important) how to work-around? :-)
>
> The cURL I am using is:
> curl 7.76.1 (i386-pc-win32) libcurl/7.76.1 OpenSSL/1.1.1k (Schannel) zlib/1.2.11 brotli/1.0.9 zstd/1.4.9 WinIDN libssh2/1.9.0 nghttp2/1.43.0 libgsasl/1.10.0
>
> -----------------------------------------------------------
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
> Etiquette: https://curl.haxx.se/mail/etiquette.html
-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2021-05-15
Date: Sat, 15 May 2021 00:24:54 -0400
Hi.
Welcome to the obnoxious world of crawling. A bunch of sites that sign
up to CDN/solns like Cloudflare are implementing recaptcha/cookie
issues.
There's nothing that says a target site has to play nice!!
However, if you walk through the process, and if you inspect the
traffic, you can possibly determine steps around the issue.
I've discovered with various targets that implement captcha, that the
image/process is usually only happening at th begining of the crawl
process. This allows me to perform a manual process where I manually
"solve" the captcha, and analyze the traffic to see what it's doing.
In most cases, the captcha process is creating a "cookie" file of some sort.
I've been able to "capture" the cookie, and to insert this in my
curl/crawl process and things more or less manage to work.
In fact, In most processes, the cookie lasts for hours/days and in
some cases I can craft a process that works through the site using
this method.
Your mileage might vary.
Good Luck!
On Fri, May 14, 2021 at 11:26 PM Mac-Fly via curl-users
<curl-users_at_cool.haxx.se> wrote:
>
> Dear all,
>
> since a few days a have the problem that a few pages that used to work just fine now return a page that comes from Cloudflare and wants to input some Captcha.
>
> For example, a call to:
> curl https://www.audacityteam.org
> ...returns a webpage with the title:
> <title>Please Wait... | Cloudflare</title>
> ...and this meta-data:
> <meta name="captcha-bypass" id="captcha-bypass" />
> ...and of course more content.
> Strangely the same page called form any browser works just fine. And also, _was_ working fine for ages!
> Also I tried to set the user agent to be exactly as my browser with no effect.
>
> Do you have any idea what this may cause and (more important) how to work-around? :-)
>
> The cURL I am using is:
> curl 7.76.1 (i386-pc-win32) libcurl/7.76.1 OpenSSL/1.1.1k (Schannel) zlib/1.2.11 brotli/1.0.9 zstd/1.4.9 WinIDN libssh2/1.9.0 nghttp2/1.43.0 libgsasl/1.10.0
>
> -----------------------------------------------------------
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
> Etiquette: https://curl.haxx.se/mail/etiquette.html
-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2021-05-15