cURL / Mailing Lists / curl-users / Single Mail

curl-users

Newbie's question on cURL usage

From: Yanhui Liu <yanhui_liu_at_yahoo.com>
Date: Wed, 27 Feb 2002 07:00:52 -0800

Expert,

I am trying to access stock quote page on msn.com site using cURL, however
I could not get it to work. Could you help me out? I am using curl 7.9.4
(i686-pc-linux-gnu) libcurl 7.9.4 (OpenSSL 0.9.6) on Redhat Linux 7.1.

If I use netscape, I can get the content in two steps
Step 1. To login at
        http://login.passport.com/login.srf
        using llp_gapper_at_yahoo.com/ladder as username/password

        after this step, I can see some information in the netscape's cookie file

kcookie.netscape.com FALSE / FALSE 4294967295 kcookie
<script>location="."</script><script>do{}while(true)</script>
.passport.com TRUE / FALSE 2145801601 MSPPre llp_gapper_at_yahoo.com

Step 2. To access the page at
        http://moneycentral.msn.com/investor/quotes/pprtq.asp?Page=RTQ&Symbol=msft

        after getting the page, there are two more lines added to the cookie file

kcookie.netscape.com FALSE / FALSE 4294967295 kcookie
<script>location="."</script><script>do{}while(true)</script>
.passport.com TRUE / FALSE 2145801601 MSPPre llp_gapper_at_yahoo.com
.msn.com TRUE / FALSE 1604217640 MC1 GUID=00D22A5A9CA043E6A8730ABFDF1F3195
moneycentral.msn.com TRUE / FALSE 1288598440 QUAUTH
65ccfb98c01cf2cc0bf4c4eff273d0a1d5c8becfc267703d87bbc062c642fe6146c64f42e1e0f9bf58

I tried to simulate these two steps using cURL, however failed. First, I
used Dan's formfind.pl program to get the information on the login page,

$./formfind.pl http://login.passport.com/login.srf

.... SKIP several forms ....

--- FORM report. Uses POST to URL
"https://login.passport.com/ppsecure/post.srf?lc=1033&id=3&ru=http://memberservices.passport.com/memberservice.srf%3flc%3d1033&tw=20&da=passport.com"
Input: notinframe=1 (HIDDEN)
Input: login=<Type (TEXT)
Input: passwd (PASSWORD)
Input: sec=rem (CHECKBOX)
Input: submit1 (SUBMIT)
Input: mspp_shared=1 (CHECKBOX)
--- end of FORM

Then I tried following steps with cURL to get the content,

Step 1. Dumping the header information without redirect

$ curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -D headers
-v -i -s -d
"lc=1033&id=3&ru=&tw=20&da=passport.com&notinframe=1&login=llp_gapper_at_yahoo.com&passwd=ladder&submit1=+Sign+In+"
-o /tmp/junk --url "http://login.passport.com/login.srf"

( I got the value for submit1 from the page source code, is it right? )

* Connected to login.passport.com (64.4.59.254)
> POST /login.srf HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
Host: login.passport.com
Pragma: no-cache
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Content-Length: 110
Content-Type: application/x-www-form-urlencoded

lc=1033&id=3&ru=&tw=20&da=passport.com&notinframe=1&login=llp_gapper_at_yahoo.com&passwd=ladder&submit1=+Sign+In+*
Closing connection #0

And the dumped "headers" file looks like,

$ more headers
HTTP/1.1 100 Continue
Server: Microsoft-IIS/5.0
Date: Wed, 27 Feb 2002 13:49:05 GMT

HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Date: Wed, 27 Feb 2002 13:49:05 GMT
Connection: close
Expires: Wed, 27 Feb 2002 13:48:06 GMT
Cache-Control: no-cache
cachecontrol: no-store
Pragma: no-cache
P3P: CP="DSP CUR OTPi IND OTRi ONL FIN"
Content-Type: text/html
Content-Length: 18991
Set-Cookie: MSPRequ=lt=1014817746&co=1&id=3
Set-Cookie: BrowserTest=Success?; domain=.passport.com;path=/;version=1

Step 2. Accessing the page using the header file dumped in step 1

$ curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -b headers
-L -v -i -s -o junk.DATA --url
"http://moneycentral.msn.com/investor/quotes/pprtq.asp?Page=RTQ&Symbol=msft"
* Connected to moneycentral.com (207.46.189.14)
> GET /investor/quotes/pprtq.asp?Page=RTQ&Symbol=msft HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
Host: moneycentral.msn.com
Pragma: no-cache
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Cookie: MSPRequ=lt=1014817746&co=1&id=3

* Follow to new URL:
/pplogin.asp?Page=http://moneycentral.msn.com/investor/quotes/pprtq.asp&Query=Page%3DRTQ%26Symbol%3Dmsft%26REQUEST%5FMETHOD%3DGET&AuthTime=43200&ForceLogin=False
* Closing connection #0
* Follows Location: to new URL:
'http://moneycentral.msn.com/pplogin.asp?Page=http://moneycentral.msn.com/investor/quotes/pprtq.asp&Query=Page%3DRTQ%26Symbol%3Dmsft%26REQUEST%5FMETHOD%3DGET&AuthTime=43200&ForceLogin=False'
* Disables POST, goes with GET
* Connected to moneycentral.com (207.46.189.14)
> GET
/pplogin.asp?Page=http://moneycentral.msn.com/investor/quotes/pprtq.asp&Query=Page%3DRTQ%26Symbol%3Dmsft%26REQUEST%5FMETHOD%3DGET&AuthTime=43200&ForceLogin=False
HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
Host: moneycentral.msn.com
Pragma: no-cache
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Cookie: MC1=GUID=CE54EA89ACB944A2BA449EA6F8701C86;
MSPRequ=lt=1014817746&co=1&id=3

* Follow to new URL:
http://login.passport.com/login.srf?lc=1033&id=229&ru=http%3A%2F%2Fmoneycentral%2Emsn%2Ecom%2Fpploggedin%2Easp%3FPage%3Dhttp%253A%252F%252Fmoneycentral%252Emsn%252Ecom%252Finvestor%252Fquotes%252Fpprtq%252Easp%26Query%3DPage%253DRTQ%2526Symbol%253Dmsft%2526REQUEST%255FMETHOD%253DGET&tw=43200&kv=2&ct=1014817808&ver=2.0.0248.1&tpf=4727ac5f0554d93f99f4048aaabbef61
* Closing connection #0
* Follows Location: to new URL:
'http://login.passport.com/login.srf?lc=1033&id=229&ru=http%3A%2F%2Fmoneycentral%2Emsn%2Ecom%2Fpploggedin%2Easp%3FPage%3Dhttp%253A%252F%252Fmoneycentral%252Emsn%252Ecom%252Finvestor%252Fquotes%252Fpprtq%252Easp%26Query%3DPage%253DRTQ%2526Symbol%253Dmsft%2526REQUEST%255FMETHOD%253DGET&tw=43200&kv=2&ct=1014817808&ver=2.0.0248.1&tpf=4727ac5f0554d93f99f4048aaabbef61'
* Disables POST, goes with GET
* Connected to login.passport.com (64.4.59.254)
> GET
/login.srf?lc=1033&id=229&ru=http%3A%2F%2Fmoneycentral%2Emsn%2Ecom%2Fpploggedin%2Easp%3FPage%3Dhttp%253A%252F%252Fmoneycentral%252Emsn%252Ecom%252Finvestor%252Fquotes%252Fpprtq%252Easp%26Query%3DPage%253DRTQ%2526Symbol%253Dmsft%2526REQUEST%255FMETHOD%253DGET&tw=43200&kv=2&ct=1014817808&ver=2.0.0248.1&tpf=4727ac5f0554d93f99f4048aaabbef61
HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
Host: login.passport.com
Pragma: no-cache
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Cookie: BrowserTest=Success?; MSPRequ=lt=1014817746&co=1&id=3

* Closing connection #0

I am getting nowhere, still hitting the login page. The cookie in the
"Headers" file doesn't work, looks like the login state is lost somewhere.
However, the cookie info in the "Headers" looks the same as Netscape gets
in step 1. How to get the cookies as Netscape does in step 2 using cURL?
Are they required to access the content?

Have I done something wrong? Or it is not possible to access the page using
cURL, since Javascript technology involved. Do you have any ideas?

Two suggestions,

1. Is it possible for curl to use Netscape's cookie? So we can get to the
client state using Netscape as a tool.
2. It would be terrific if cURL had a recording function with simple GUI
browser, so you can use the GUI to access the page and curl records all
necessary info to get there. It will save a lot of switches and flatten the
learning curve to use cURL.

Thanks a lot for the help.

Yanhui

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Received on 2002-02-27