curl / Mailing Lists / curl-library / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Re: Getting CAPTCHA response when download a webpage

From: Mah. E. via curl-library <curl-library_at_cool.haxx.se>
Date: Mon, 20 Jul 2020 10:44:31 +0200

This is line number 8 of the sample downloaded containing captcha response
<meta name="captcha-bypass" id="captcha-bypass" />

I extracted the header used by firefox and chrome, also i captured the
headers used by KGet download manager and used it with curl and got the
same results, still not sure what is missing

On Mon, Jul 20, 2020 at 12:27 PM Jiahao XU <Jiahao_XU_at_outlook.com> wrote:

> I think I know why curl’s output is so different to chrome:
> The request header is different.
>
> To be specific, I think it is probably the mime types that matters.
>
> Chrome might have sent out mime types it supported in the requested
> header, the signal that html5 video player or something like that is
> available.
>
> Without such mime settings, the site probably won’t send the right html
> page.
>
> I am no where farmiliar with chrome or h5 video player, but I suppose that
> the request header plays an important rule here.
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
> ------------------------------
> *From:* Jiahao XU <Jiahao_XU_at_outlook.com>
> *Sent:* Monday, July 20, 2020 8:22:32 PM
> *To:* Mah. E. <mahmoud.aboghazala_at_gmail.com>
> *Cc:* libcurl development <curl-library_at_cool.haxx.se>;
> curl-users_at_cool.haxx.se <curl-users_at_cool.haxx.se>
> *Subject:* Re: Getting CAPTCHA response when download a webpage
>
> You seemed to have a typo: it should be captcha, not capcha.
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
> ------------------------------
> *From:* Mah. E. <mahmoud.aboghazala_at_gmail.com>
> *Sent:* Monday, July 20, 2020 8:18:09 PM
> *To:* Jiahao XU <Jiahao_XU_at_outlook.com>
> *Cc:* libcurl development <curl-library_at_cool.haxx.se>;
> curl-users_at_cool.haxx.se <curl-users_at_cool.haxx.se>
> *Subject:* Re: Getting CAPTCHA response when download a webpage
>
>
> I set useragent to “Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
> (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36” and got 9393 bytes
> of output.
>
>
> This 9393 bytes is an html contains captcha response and not the actual
> web page which should be around 130 kb
> To make sure open the downloaded content with any editor and search for
> word “capcha” or “Cloudflare”
> Also you will get same result with curl without setting any headers or
> user agent
>
> I tried user agent and cookie file from a browser and nothing seems to work
>
> Regards,
> Mahmoud
>
> On Jul 20, 2020, at 9:55 AM, Jiahao XU <Jiahao_XU_at_outlook.com> wrote:
>
> IMHO it could be user agent and cookies affecting the output.
>
> I set useragent to “Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
> (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36” and got 9393 bytes
> of output.
>
> However, when I tried with wget, I got 403 forbidden.
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
> ------------------------------
> *From:* curl-library <curl-library-bounces_at_cool.haxx.se> on behalf of
> Mah. E. via curl-library <curl-library_at_cool.haxx.se>
> *Sent:* Monday, July 20, 2020 8:38:47 AM
> *To:* curl-users_at_cool.haxx.se <curl-users_at_cool.haxx.se>
> *Cc:* Mah. E. <mahmoud.aboghazala_at_gmail.com>; curl-library_at_cool.haxx.se <
> curl-library_at_cool.haxx.se>
> *Subject:* Getting CAPTCHA response when download a webpage
>
> First, Thanks for this awesome tool.
>
> is there anyway to download this web page using curl
>
>
> https://www.crunchyroll.com/en-gb/blood-blockade-battlefront/episode-6-get-the-lock-out-754075
>
> because i get Cloudflare CAPTCHA html response "8 kb file"
> but if i use a different downloaders like KGet or idm i get the actual
> page "131 Kb file size"
>
> i can open this link on any browser and never ask for recaptcha
> i tried also to use the full headers from the browser with curl command
> with no success
>
> any help would be appreciated
>
>

-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2020-07-20