curl-users
Re: curl issue...
Date: Mon, 4 Jul 2016 01:37:22 -0400
On 7/3/2016 9:12 AM, bruce wrote:
> weirdness abounds..
>
> wget -- this works.. consistently..
>
> echo '' > a.lwp
> wget -vvv --user-agent "Mozilla/5.0 (compatible; MSIE 9.0; Windows
> NT 6.1; WOW64; Trident/5.0; chromeframe/12.0.742.112)" --cookies=on
> --load-cookies=a.lwp --keep-session-cookies
> --save-cookies=a.lwp -O - "http://www.bkstr.com"
>
> wget -vvv --user-agent "Mozilla/5.0 (compatible; MSIE 9.0; Windows
> NT 6.1; WOW64; Trident/5.0; chromeframe/12.0.742.112)" --cookies=on
> --load-cookies=a.lwp --keep-session-cookies
> --save-cookies=a.lwp -O -
> "http://www.bkstr.com/webapp/wcs/stores/servlet/LocateCourseMaterialsServlet?requestType=INITIAL&storeId=432905&demoKey=d"
>
>
> curl -this hangs, it appears to be a cookie thing with the 1st/2nd curls
> echo '' > a.lwp
> curl -vvv -A "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
> Firefox/38.0" --cookie-jar 'a.lwp' -L "http://www.bkstr.com"
>
> curl -vvv -A "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
> Firefox/38.0" --cookie a.lwp --cookie-jar a.lwp -L
> "https://www.bkstr.com/webapp/wcs/stores/servlet/LocateCourseMaterialsServlet?requestType=INITIAL&storeId=432905&demoKey=d"
>
>
> I've tested using single/double quotes around the curl file I've
> tested using --cookie as well as --cookie-jar in both 1st and 2nd
> curls. It appears to be a consistent issue with the server/curl
> generating the cookies/string the returned cokies.
>
> When I just test the 1st, I still don't get consistent cookies back.
>
> Again, running the wget seems to cinsistently work with regards to the
> cookie issue.
>
> thoughts/comments..
Please don't top post [1] it makes the conversation harder to follow.
If you are using Windows command prompt single quotes are part of the
argument, so when you write 'a.lwp' Windows actually writes to the
filename where ' is the first and last character. Next you attempt to
read in a.lwp without quotes and curl can't find that since you actually
have 'a.lwp'.
You are dealing with caching servers, as I mentioned to you last month
regarding this issue [2][3].
Vary: Accept-Encoding,User-Agent
That is basically the server telling you different content MAY be served
depending on what is sent for accept-encoding and user-agent. It is
helpful to caches which can use that information to determine how to
cache the page. In your case the website is heavy on javascript and IIRC
it was using at least F5 in some setup where if there's a no-hit in the
cache it will send javascript to be executed immediately, which then
makes follow up requests causing the actual page to be returned, with
cookies. That page should then be cached with that agent/encoding
combination and future requests will return that page as long as it
hasn't expired.
wget appears to work here because you are using a different user-agent
that the caching server already has a cached version of bkstr for so the
extra javascript isn't sent, instead you get the cached version and the
right cookies. The user agent may continue to work until it doesn't.
You're really relying on someone in their browser using that user agent
on that website on that page often enough that it stays in the cache.
As I mentioned last month at the very least you need a JESSIONID cookie
to avoid the hang, and I think the fastest way is to just supply a blank
one. The server will either ignore it or return a valid one.
Initially create one like this:
printf "www.bkstr.com\tFALSE\t/\tFALSE\t0\tJSESSIONID\t" > a.lwp
Then on future requests you should be able to just do this:
curl -v -b a.lwp -c a.lwp -A "Mozilla/5.0 (X11; Linux x86_64; rv:38.0)
Gecko/20100101 Firefox/38.0" -L
"https://www.bkstr.com/webapp/wcs/stores/servlet/LocateCourseMaterialsServlet?requestType=INITIAL&storeId=432905&demoKey=d"
If it's more complicated than that then try something like casperjs,
which you may have done already [4].
[1]: https://curl.haxx.se/mail/etiquette.html#Do_Not_Top_Post
[2]: https://curl.haxx.se/mail/archive-2016-05/0011.html
[3]: https://curl.haxx.se/mail/archive-2016-05/0027.html
[4]: https://curl.haxx.se/mail/archive-2016-05/0026.html
-------------------------------------------------------------------
List admin: https://cool.haxx.se/list/listinfo/curl-users
FAQ: https://curl.haxx.se/docs/faq.html
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2016-07-04