cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Newbie's question on cURL usage

From: Ralph Mitchell <rmitchell_at_eds.com>
Date: Tue, 05 Mar 2002 00:41:16 -0600

Yanhui Liu wrote:

> Wow...!! Such a wild a ride, amazingly it worked! Thanks a lot, Ralph.
> Definitely this makes my day brighter:-)
>
> Now looking at the solution, I have a little bit more understanding how to
> use cURL -- just follow the redirect (action) chains and keep the cookie
> information at the same time, till you reach the target. Am I right?
> However, can this philosophy apply to every situation?

As you say, follow the redirects - this is how your browser would do it, so you would need to have a powerful reason to do it differently, such as knowing for sure a different approach would work.

> Also I really think all these steps are too much for beginners. Is it
> possible to allow cURL to automate this process? We know the input (login
> info) and output (target page), cURL could handle all the mess internally
> instead of shifting the burden to the user. Is this technically achievable?
> Maybe it is just a naive idea.

One major reason for not even attempting to implement this mess inside cURL is exactly that - it would be extremely messy. For example, the page you've been trying to post to requires you to login via Passport.com. The Passport page expects the login name and password to be passed via the POST variables "login" and "passwd". A page *I've* been trying to login to
recently expects the name and password to be passed via "userId" and "userPasswd". Another page might use "userName" and "password", or "loginname", or "name", or "id" or anything else the page author thinks is useful and/or humorous.

The thing to remember is that the author probably does not expect anyone to try to automate logging in to his page, so he'll write it in a way that works for him, without any regard to folks like us. Another important thing to remember is that the author may not even have English as his first language. The POST variable names might be any of the above listed words
(or any other words that "feel right") but in Swedish, French, German, Spanish, Icelandic, Urdu, Swahili, Italian or Dutch. Or possibly even in a non-human language such as Elvish or Dwarfish, if the dude happens to be a Tolkien fanatic... Or Klingon, if he's a Trekkie...

Sorry...

Ralph Mitchell

>
> Anyway, thanks again for the help.
>
> Yanhui
>
> PS: An example section could be created somewhere in the documentation, and
> Ralph's solution will benefit quite a few people.
>
> At 05:21 AM 2/28/02 -0600, you wrote:
> >It looks to me like you might be posting to the wrong URL... And
> >possibly starting from the wrong location...
> >
> >Try this:
> >
> >=========
> > cat /dev/null cookies
> >
> > # Start by trying to get the final target - provoke MSFT into making
> >you login
> > curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -b
> >cookies -s \
> > -c cookies -L -o money0.html \
> > --url
> >"http://Moneycentral.msn.com/investor/quotes/pprtq.asp?Page=RTQ&Symbol=orcl"
> >
> > # Pull the login url out of the file - yep, really the last one, who
> >knows why
> > url=`grep -i action money0.html | tail -1 | sed -e 's/^.*action="//'
> >-e 's/".*//'`
> >
> > # Login with all the bells and whistles from the previous page
> > curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -b
> >cookies -s \
> > -c cookies -L -o money1.html \
> > -d
> >"notinframe=1&login=llp_gapper_at_yahoo.com&passwd=ladder&sec=rem&submit1=+Sign+In+&mspp_shared=1"
> >\
> > --url "$url"
> >
> > # Pull the META REFRESH tag and fetch that
> > url=`grep URL money1.html | sed -e 's/^.*URL=//' -e 's/".*//'`
> >
> > curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -b
> >cookies -s \
> > -c cookies -L -o money2.html \
> > --url "$url"
> >
> > # Pull another META REFRESH tag and fetch that too
> > url=`grep url money2.html | sed -e 's/^.*url=//' -e 's/".*//'`
> >
> > curl -A "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" -b
> >cookies -s \
> > -i -c cookies -L -o money3.html \
> > --url "$url"
> >
> > # money3.html should contain the target page...
> >=================
> >
> >Works for me most of the time... :) Sometimes I get a Location header
> >in money3.html that contains "?Error=TooManyResets" instead of the stock
> >quote page. Dunno why curl didn't follow that Location header...
> >
> >Seems like when it fails, the money3.html file comes out at around 563
> >bytes. Running the script again generally gets the proper result, which
> >is over 17Kb big.
> >
> >Ralph Mitchell
> >curl 7.9.5-pre4 (i686-pc-linux-gnu) libcurl 7.9.5-pre4 (OpenSSL 0.9.6c)
Received on 2002-03-05