curl / Mailing Lists / curl-library / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Re: Getting CAPTCHA response when download a webpage

From: Jiahao XU via curl-library <curl-library_at_cool.haxx.se>
Date: Tue, 21 Jul 2020 02:39:19 +0000

Then I don’t know what is missing, too.😂

Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Mah. E. <mahmoud.aboghazala_at_gmail.com>
Sent: Monday, July 20, 2020 6:44:31 PM
To: Jiahao XU <Jiahao_XU_at_outlook.com>
Cc: libcurl development <curl-library_at_cool.haxx.se>; curl-users_at_cool.haxx.se <curl-users_at_cool.haxx.se>
Subject: Re: Getting CAPTCHA response when download a webpage

This is line number 8 of the sample downloaded containing captcha response <meta name="captcha-bypass" id="captcha-bypass" />

I extracted the header used by firefox and chrome, also i captured the headers used by KGet download manager and used it with curl and got the same results, still not sure what is missing

On Mon, Jul 20, 2020 at 12:27 PM Jiahao XU <Jiahao_XU_at_outlook.com<mailto:Jiahao_XU_at_outlook.com>> wrote:
I think I know why curl’s output is so different to chrome:
The request header is different.

To be specific, I think it is probably the mime types that matters.

Chrome might have sent out mime types it supported in the requested header, the signal that html5 video player or something like that is available.

Without such mime settings, the site probably won’t send the right html page.

I am no where farmiliar with chrome or h5 video player, but I suppose that the request header plays an important rule here.

Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Jiahao XU <Jiahao_XU_at_outlook.com<mailto:Jiahao_XU_at_outlook.com>>
Sent: Monday, July 20, 2020 8:22:32 PM
To: Mah. E. <mahmoud.aboghazala_at_gmail.com<mailto:mahmoud.aboghazala_at_gmail.com>>
Cc: libcurl development <curl-library_at_cool.haxx.se<mailto:curl-library_at_cool.haxx.se>>; curl-users_at_cool.haxx.se<mailto:curl-users_at_cool.haxx.se> <curl-users_at_cool.haxx.se<mailto:curl-users_at_cool.haxx.se>>
Subject: Re: Getting CAPTCHA response when download a webpage

You seemed to have a typo: it should be captcha, not capcha.

Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Mah. E. <mahmoud.aboghazala_at_gmail.com<mailto:mahmoud.aboghazala_at_gmail.com>>
Sent: Monday, July 20, 2020 8:18:09 PM
To: Jiahao XU <Jiahao_XU_at_outlook.com<mailto:Jiahao_XU_at_outlook.com>>
Cc: libcurl development <curl-library_at_cool.haxx.se<mailto:curl-library_at_cool.haxx.se>>; curl-users_at_cool.haxx.se<mailto:curl-users_at_cool.haxx.se> <curl-users_at_cool.haxx.se<mailto:curl-users_at_cool.haxx.se>>
Subject: Re: Getting CAPTCHA response when download a webpage

I set useragent to “Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36” and got 9393 bytes of output.

This 9393 bytes is an html contains captcha response and not the actual web page which should be around 130 kb
To make sure open the downloaded content with any editor and search for word “capcha” or “Cloudflare”
Also you will get same result with curl without setting any headers or user agent

I tried user agent and cookie file from a browser and nothing seems to work

Regards,
Mahmoud

On Jul 20, 2020, at 9:55 AM, Jiahao XU <Jiahao_XU_at_outlook.com<mailto:Jiahao_XU_at_outlook.com>> wrote:

IMHO it could be user agent and cookies affecting the output.

I set useragent to “Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36” and got 9393 bytes of output.

However, when I tried with wget, I got 403 forbidden.

Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: curl-library <curl-library-bounces_at_cool.haxx.se<mailto:curl-library-bounces_at_cool.haxx.se>> on behalf of Mah. E. via curl-library <curl-library_at_cool.haxx.se<mailto:curl-library_at_cool.haxx.se>>
Sent: Monday, July 20, 2020 8:38:47 AM
To: curl-users_at_cool.haxx.se<mailto:curl-users_at_cool.haxx.se> <curl-users_at_cool.haxx.se<mailto:curl-users_at_cool.haxx.se>>
Cc: Mah. E. <mahmoud.aboghazala_at_gmail.com<mailto:mahmoud.aboghazala_at_gmail.com>>; curl-library_at_cool.haxx.se<mailto:curl-library_at_cool.haxx.se> <curl-library_at_cool.haxx.se<mailto:curl-library_at_cool.haxx.se>>
Subject: Getting CAPTCHA response when download a webpage

First, Thanks for this awesome tool.

is there anyway to download this web page using curl

https://www.crunchyroll.com/en-gb/blood-blockade-battlefront/episode-6-get-the-lock-out-754075

because i get Cloudflare CAPTCHA html response "8 kb file"
but if i use a different downloaders like KGet or idm i get the actual page "131 Kb file size"

i can open this link on any browser and never ask for recaptcha
i tried also to use the full headers from the browser with curl command with no success

any help would be appreciated

-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2020-07-21