Re: Getting CAPTCHA response when download a webpage
Date: Tue, 21 Jul 2020 02:39:19 +0000
Then I don’t know what is missing, too.😂
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Mah. E. <mahmoud.aboghazala_at_gmail.com>
Sent: Monday, July 20, 2020 6:44:31 PM
To: Jiahao XU <Jiahao_XU_at_outlook.com>
Cc: libcurl development <curl-library_at_cool.haxx.se>; curl-users_at_cool.haxx.se <curl-users_at_cool.haxx.se>
Subject: Re: Getting CAPTCHA response when download a webpage
This is line number 8 of the sample downloaded containing captcha response <meta name="captcha-bypass" id="captcha-bypass" />
I extracted the header used by firefox and chrome, also i captured the headers used by KGet download manager and used it with curl and got the same results, still not sure what is missing
On Mon, Jul 20, 2020 at 12:27 PM Jiahao XU <Jiahao_XU_at_outlook.com<mailto:Jiahao_XU_at_outlook.com>> wrote:
I think I know why curl’s output is so different to chrome:
The request header is different.
To be specific, I think it is probably the mime types that matters.
Chrome might have sent out mime types it supported in the requested header, the signal that html5 video player or something like that is available.
Without such mime settings, the site probably won’t send the right html page.
I am no where farmiliar with chrome or h5 video player, but I suppose that the request header plays an important rule here.
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Jiahao XU <Jiahao_XU_at_outlook.com<mailto:Jiahao_XU_at_outlook.com>>
Sent: Monday, July 20, 2020 8:22:32 PM
To: Mah. E. <mahmoud.aboghazala_at_gmail.com<mailto:mahmoud.aboghazala_at_gmail.com>>
Cc: libcurl development <curl-library_at_cool.haxx.se<mailto:curl-library_at_cool.haxx.se>>; curl-users_at_cool.haxx.se<mailto:curl-users_at_cool.haxx.se> <curl-users_at_cool.haxx.se<mailto:curl-users_at_cool.haxx.se>>
Subject: Re: Getting CAPTCHA response when download a webpage
You seemed to have a typo: it should be captcha, not capcha.
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Mah. E. <mahmoud.aboghazala_at_gmail.com<mailto:mahmoud.aboghazala_at_gmail.com>>
Sent: Monday, July 20, 2020 8:18:09 PM
To: Jiahao XU <Jiahao_XU_at_outlook.com<mailto:Jiahao_XU_at_outlook.com>>
Cc: libcurl development <curl-library_at_cool.haxx.se<mailto:curl-library_at_cool.haxx.se>>; curl-users_at_cool.haxx.se<mailto:curl-users_at_cool.haxx.se> <curl-users_at_cool.haxx.se<mailto:curl-users_at_cool.haxx.se>>
Subject: Re: Getting CAPTCHA response when download a webpage
I set useragent to “Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36” and got 9393 bytes of output.
This 9393 bytes is an html contains captcha response and not the actual web page which should be around 130 kb
To make sure open the downloaded content with any editor and search for word “capcha” or “Cloudflare”
Also you will get same result with curl without setting any headers or user agent
I tried user agent and cookie file from a browser and nothing seems to work
Regards,
Mahmoud
On Jul 20, 2020, at 9:55 AM, Jiahao XU <Jiahao_XU_at_outlook.com<mailto:Jiahao_XU_at_outlook.com>> wrote:
IMHO it could be user agent and cookies affecting the output.
I set useragent to “Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36” and got 9393 bytes of output.
However, when I tried with wget, I got 403 forbidden.
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: curl-library <curl-library-bounces_at_cool.haxx.se<mailto:curl-library-bounces_at_cool.haxx.se>> on behalf of Mah. E. via curl-library <curl-library_at_cool.haxx.se<mailto:curl-library_at_cool.haxx.se>>
Sent: Monday, July 20, 2020 8:38:47 AM
To: curl-users_at_cool.haxx.se<mailto:curl-users_at_cool.haxx.se> <curl-users_at_cool.haxx.se<mailto:curl-users_at_cool.haxx.se>>
Cc: Mah. E. <mahmoud.aboghazala_at_gmail.com<mailto:mahmoud.aboghazala_at_gmail.com>>; curl-library_at_cool.haxx.se<mailto:curl-library_at_cool.haxx.se> <curl-library_at_cool.haxx.se<mailto:curl-library_at_cool.haxx.se>>
Subject: Getting CAPTCHA response when download a webpage
First, Thanks for this awesome tool.
is there anyway to download this web page using curl
https://www.crunchyroll.com/en-gb/blood-blockade-battlefront/episode-6-get-the-lock-out-754075
because i get Cloudflare CAPTCHA html response "8 kb file"
but if i use a different downloaders like KGet or idm i get the actual page "131 Kb file size"
i can open this link on any browser and never ask for recaptcha
i tried also to use the full headers from the browser with curl command with no success
any help would be appreciated
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2020-07-21