cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Logging into TIAA-CREF.org with a Tcl+cURL script -- (broken script posted)

From: Ralph Mitchell <ralphmitchell_at_gmail.com>
Date: Sun, 20 Apr 2014 12:39:25 -0400

Does the page have any kind of frames? I once had to screen-scrape a site
that had a top bar and a side bar, as well as the main box. I found that I
had to fetch both the top and side bars, in the correct order, to get
several cookies, otherwise the main box came back with nothing useful.

You might also want to download any javascript files and make sure there's
nothing in those creating cookies mid-flight.

Ralph Mitchell

On Sun, Apr 20, 2014 at 9:00 AM, <listmeister_at_thestoneforge.com> wrote:

> TIAA-CREF has a two step login process. The first form accepts
> username only, and the next page asks for a password and an extra
> security question (which never changes).
>
> The script below is able to send the username and get a form asking
> for the password. But when it comes to password submission, the login
> page comes back (despite no HTTP errors). The cookie changes a few
> times (which is possibly normal), but there seems to be *fewer*
> cookies in the header than what "live http headers" shows firefox to
> be exchanging.
>
> Any ideas? Any tiaa-cref customers want to collaborate on this?
>
> The script so far:
>
> #!/usr/bin/tclsh8.5
>
> package require TclCurl
> package require htmlparse
>
> set url(top) "https://publictools.tiaa-cref.org"
> set userAgent "Mozilla/5.0"
> set login_suburl "unset"
>
> proc extract_form {args} {
> #this procedure is needed to get the first login URL, which contains a
> unique string instead of a cookie
> global login_suburl
>
> foreach {tag slash param text} $args {break}
> if {$tag == "form" && [regexp {action..([^\"]*)\".*} $param ->
> action]} {
> set login_suburl "$action"
> }
> }
>
> proc getLoginForm {curlHandle} {
> $curlHandle configure -url "
> https://publictools.tiaa-cref.org/private/selfservices/secureresource/redirect.do
> "\
> -bodyvar html_form\
> -errorbuffer errorBuffer
>
> if {[catch {$curlHandle perform} r ] == 0} {
> htmlparse::parse -cmd extract_form -incvar incomplete_html
> $html_form
> } else {
> puts "ERROR with getLoginForm perform operation"
> return -code error $errorBuffer
> }
> }
>
> proc getCookieAndPWform {curlHandle url fn_cookie} {
> set post_userid_list [list "userId=MY_USER_NAME"\
> "rememberMe=false"]
>
> $curlHandle configure -url $url\
> -postfields [join $post_userid_list &]\
> -writeheader $fn_cookie\
> -file /tmp/tiaa-cref_password_page.html
>
> if {[catch {$curlHandle perform} r] == 0} {
> set httpCode [$curlHandle getinfo httpcode]
> set contentType [$curlHandle getinfo contenttype]
> set redirectCount [$curlHandle getinfo redirectcount]
> set fileTime [$curlHandle getinfo filetime]
> set effUrl [$curlHandle getinfo effectiveurl]
> set totalTime [$curlHandle getinfo totaltime]
>
> return $fn_cookie
> } else {
> return -code error "$r $errorBuffer"
> }
> }
>
> proc supplyPW {curlHandle fn_cookie} {
>
> set post_data_list [list "password=MY_PASSWORD"
> "securityQuestionAnswer=MY_SECURITY_ANSWER"]
>
> $curlHandle configure -url "
> https://publictools.tiaa-cref.org/private/selfservices/sso/login.do?command=validateQuestion
> "\
> -postfields [join $post_data_list &]\
> -cookiefile $fn_cookie\
> -writeheader $fn_cookie\
> -file /tmp/tiaa-cref_account_page.html
>
> if {[catch {$curlHandle perform} r] == 0} {
> set httpCode [$curlHandle getinfo httpcode]
> set contentType [$curlHandle getinfo contenttype]
> set redirectCount [$curlHandle getinfo redirectcount]
> set fileTime [$curlHandle getinfo filetime]
> set effUrl [$curlHandle getinfo effectiveurl]
> set totalTime [$curlHandle getinfo totaltime]
> } else {
> puts "ERROR1"
> return -code error $errorBuffer
> }
> }
>
> set curlHandle [ ::curl::init ]
>
> $curlHandle configure -protocols https \
> -verbose 1 \
> -errorbuffer errorBuffer \
> -failonerror 1 \
> -followlocation 1 \
> -useragent $userAgent
>
> if {[catch {getLoginForm $curlHandle} r] == 0} {
> set loginHandle $r
> } else {
> puts "ERROR:"
> puts $r
>
> $curlHandle cleanup
> exit 1
> }
>
> if {[catch {getCookieAndPWform $curlHandle $url(top)/$login_suburl
> /tmp/tiaa-cref_cookie.txt} r] == 0} {
> set fn_cookie $r
> } else {
> puts "ERROR:"
> puts $r
>
> $curlHandle cleanup
> exit 1
> }
>
> if {[catch {supplyPW $curlHandle $fn_cookie} r] == 0} {
> set fn_cookie $r
> } else {
> puts "ERROR:"
> puts $r
>
> $curlHandle cleanup
> exit 1
> }
> $curlHandle cleanup
>
> -------------------------------------------------------------------
> List admin: http://cool.haxx.se/list/listinfo/curl-users
> FAQ: http://curl.haxx.se/docs/faq.html
> Etiquette: http://curl.haxx.se/mail/etiquette.html
>

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2014-04-20