cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Newbie's question on cURL usage

From: Yanhui Liu <yanhui_liu_at_yahoo.com>
Date: Fri, 01 Mar 2002 08:23:21 -0800

Wow...!! Such a wild a ride, amazingly it worked! Thanks a lot, Ralph.
Definitely this makes my day brighter:-)

Now looking at the solution, I have a little bit more understanding how to
use cURL -- just follow the redirect (action) chains and keep the cookie
information at the same time, till you reach the target. Am I right?
However, can this philosophy apply to every situation?

Also I really think all these steps are too much for beginners. Is it
possible to allow cURL to automate this process? We know the input (login
info) and output (target page), cURL could handle all the mess internally
instead of shifting the burden to the user. Is this technically achievable?
Maybe it is just a naive idea.

Anyway, thanks again for the help.

Yanhui

PS: An example section could be created somewhere in the documentation, and
Ralph's solution will benefit quite a few people.

At 05:21 AM 2/28/02 -0600, you wrote:
>It looks to me like you might be posting to the wrong URL... And
>possibly starting from the wrong location...
>
>Try this:
>
>=========
> cat /dev/null cookies
>
> # Start by trying to get the final target - provoke MSFT into making
>you login
> curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -b
>cookies -s \
> -c cookies -L -o money0.html \
> --url
>"http://Moneycentral.msn.com/investor/quotes/pprtq.asp?Page=RTQ&Symbol=orcl"
>
> # Pull the login url out of the file - yep, really the last one, who
>knows why
> url=`grep -i action money0.html | tail -1 | sed -e 's/^.*action="//'
>-e 's/".*//'`
>
> # Login with all the bells and whistles from the previous page
> curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -b
>cookies -s \
> -c cookies -L -o money1.html \
> -d
>"notinframe=1&login=llp_gapper_at_yahoo.com&passwd=ladder&sec=rem&submit1=+Sign+In+&mspp_shared=1"
>\
> --url "$url"
>
> # Pull the META REFRESH tag and fetch that
> url=`grep URL money1.html | sed -e 's/^.*URL=//' -e 's/".*//'`
>
> curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -b
>cookies -s \
> -c cookies -L -o money2.html \
> --url "$url"
>
> # Pull another META REFRESH tag and fetch that too
> url=`grep url money2.html | sed -e 's/^.*url=//' -e 's/".*//'`
>
> curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -b
>cookies -s \
> -i -c cookies -L -o money3.html \
> --url "$url"
>
> # money3.html should contain the target page...
>=================
>
>Works for me most of the time... :) Sometimes I get a Location header
>in money3.html that contains "?Error=TooManyResets" instead of the stock
>quote page. Dunno why curl didn't follow that Location header...
>
>Seems like when it fails, the money3.html file comes out at around 563
>bytes. Running the script again generally gets the proper result, which
>is over 17Kb big.
>
>Ralph Mitchell
>curl 7.9.5-pre4 (i686-pc-linux-gnu) libcurl 7.9.5-pre4 (OpenSSL 0.9.6c)
>
>Yanhui Liu wrote:
>
> > At 05:25 PM 2/27/02 +0100, you wrote:
> >
> >> On Wed, 27 Feb 2002, Yanhui Liu wrote:
> >>
> >> > I am trying to access stock quote page on msn.com site using cURL,
> >> however
> >> > I could not get it to work. Could you help me out? I am using curl
> >> 7.9.4
> >> > (i686-pc-linux-gnu) libcurl 7.9.4 (OpenSSL 0.9.6) on Redhat Linux
> >> 7.1.
> >>
> >> 7.9.4 has a SSL read bug that will make it unreliable for SSL
> >> downloads. You
> >> should get a 7.9.5 pre-release instead, as it will work better.
> >>
> >> It would help a lot if you first of all upgraded and retried this,
> >> then come
> >> back if it still doesn't work.
> >
> >
> > I upgraded to 7.9.5-pre4, however I still could not get the target
> > page. Test results are attached at the end.
> >
> >
> >> I would also appreciate if you would be able to cut out a few issues
> >> at a
> >> time and ask specificly about them. It is very hard, and
> >> breath-taking to
> >> only parse through such a huge mail with many complex command lines
> >> involved.
> >
> >
> > Sorry about the lengthy mail, I just want to present all relevant
> > information for the problem. For me, I could not cut the problem into
> > pieces, I am totally lost.
> >
> >
> >> > 1. Is it possible for curl to use Netscape's cookie? So we can get
> >> to the
> >> > client state using Netscape as a tool.
> >>
> >> Yes, curl can read Netscape's cookies. Use -b for reading and -c can
> >> even
> >> write them back in Netscape format.
> >
> >
> > Wonderful. Does it mean curl can use netscape's cookie to retrieve
> > pages? For example, I ran the curl test using netscape's cookie, which
> > was generated after browsing the content.
> >
> > $ cp ~/.netscape/cookies .
> > $ curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -b
> > cookies -L -v -i -s -o junk.DATA --url
> >
> "http://moneycentral.msn.com/investor/quotes/pprtq.asp?Page=RTQ&Symbol=orcl"
> >
> > * Connected to moneycentral.com (207.46.189.14)
> > > GET /investor/quotes/pprtq.asp?Page=RTQ&Symbol=orcl HTTP/1.1
> > User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
> > Host: moneycentral.msn.com
> > Pragma: no-cache
> > Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
> > Cookie:
> >
> QUAUTH=66504305614a51425d57755e515248404e585a5478525906280e40640357114307215e07164b43455009;
> > MC1=GUID=00D22A5A9CA043E6A8730ABFDF1F3195
> >
> > * Follow to new URL:
> >
> /pplogin.asp?Page=http://moneycentral.msn.com/investor/quotes/pprtq.asp&Query=Page%3DRTQ%26Symbol%3Dorcl%26REQUEST%5FMETHOD%3DGET&AuthTime=43200&ForceLogin=False
> >
> > * Closing connection #0
> > * Follows Location: to new URL:
> >
> 'http://moneycentral.msn.com/pplogin.asp?Page=http://moneycentral.msn.com/investor/quotes/pprtq.asp&Query=Page%3DRTQ%26Symbol%3Dorcl%26REQUEST%5FMETHOD%3DGET&AuthTime=43200&ForceLogin=False'
> >
> > * Disables POST, goes with GET
> > * Connected to moneycentral.com (207.46.189.14)
> > > GET
> >
> /pplogin.asp?Page=http://moneycentral.msn.com/investor/quotes/pprtq.asp&Query=Page%3DRTQ%26Symbol%3Dorcl%26REQUEST%5FMETHOD%3DGET&AuthTime=43200&ForceLogin=False
> > HTTP/1.1
> > User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
> > Host: moneycentral.msn.com
> > Pragma: no-cache
> > Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
> > Cookie:
> >
> QUAUTH=66504305614a51425d57755e515248404e585a5478525906280e40640357114307215e07164b43455009;
> > MC1=GUID=00D22A5A9CA043E6A8730ABFDF1F3195
> >
> > * Follow to new URL:
> >
> http://login.passport.com/login.srf?lc=1033&id=229&ru=http%3A%2F%2Fmoneycentral%2Emsn%2Ecom%2Fpploggedin%2Easp%3FPage%3Dhttp%253A%252F%252Fmoneycentral%252Emsn%252Ecom%252Finvestor%252Fquotes%252Fpprtq%252Easp%26Query%3DPage%253DRTQ%2526Symbol%253Dorcl%2526REQUEST%255FMETHOD%253DGET&tw=43200&kv=2&ct=1014877476&ver=2.0.0248.1&tpf=5b566e78f24697f753fdd1b608fd10b3
> >
> > * Closing connection #0
> > * Follows Location: to new URL:
> >
> 'http://login.passport.com/login.srf?lc=1033&id=229&ru=http%3A%2F%2Fmoneycentral%2Emsn%2Ecom%2Fpploggedin%2Easp%3FPage%3Dhttp%253A%252F%252Fmoneycentral%252Emsn%252Ecom%252Finvestor%252Fquotes%252Fpprtq%252Easp%26Query%3DPage%253DRTQ%2526Symbol%253Dorcl%2526REQUEST%255FMETHOD%253DGET&tw=43200&kv=2&ct=1014877476&ver=2.0.0248.1&tpf=5b566e78f24697f753fdd1b608fd10b3'
> >
> > * Disables POST, goes with GET
> > * Connected to login.passport.com (64.4.60.254)
> > > GET
> >
> /login.srf?lc=1033&id=229&ru=http%3A%2F%2Fmoneycentral%2Emsn%2Ecom%2Fpploggedin%2Easp%3FPage%3Dhttp%253A%252F%252Fmoneycentral%252Emsn%252Ecom%252Finvestor%252Fquotes%252Fpprtq%252Easp%26Query%3DPage%253DRTQ%2526Symbol%253Dorcl%2526REQUEST%255FMETHOD%253DGET&tw=43200&kv=2&ct=1014877476&ver=2.0.0248.1&tpf=5b566e78f24697f753fdd1b608fd10b3
> > HTTP/1.1
> > User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
> > Host: login.passport.com
> > Pragma: no-cache
> > Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
> > Cookie: MSPPre=llp_gapper_at_yahoo.com
> >
> > * Closing connection #0
> >
> > I still ended up at the login page. The content of the cookie file
> > from netscape was shown in my first posting. So curl does not pass the
> > cookie back to server correctly.

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Received on 2002-03-01