curl-and-php
automating aspx VIEWSTATE via libcurl
Date: Sun, 31 Aug 2008 17:07:20 -0700
Hello, I have a php script that automates the process of
(https login + search_page + result (and if result not empty)= xls file
download ) for a web app.
Here's the problem: I am doing the exact same https POST that a normal user
would, however, I get strange results with curl. For example, the search
page is for finding payment info, it takes a start date, and an end date, as
well as a simple radio button for the payment type, I have been doing this
search for almost 2 weeks now, and regardless of the dates i specify, I get
the same payment info every time, which is from the very first time I made
the request!!??
And now I am getting a "nothing was found" from this server, even though I
can log in myself, perform the exact same search, and get 20 pages of db
results!!
I was hoping that someone out there may be able to shed some light on this
process.
my strategy consists of the following:
a) retrieve today's login page using curl and save.
b) extract the hidden form fields (VIEWSTATE, EVENTVALIDATION) and save to
mysql table.
c) create the POSTFIELDS with mysql data and retrieve the response)
- save the result and repeat b)
d) fill in search form
- repeat c), save result, repeat b)
f) if there were new payments found, repeat c)
g) and finally, repeat c), only this time download the .xls file and proceed
to the xls parser automation.....
are you still with me here? basically, I am not sure what the hidden
VIEWSTATE and EVENTVALIDATION data is obviously, and if I am being sabatoged
for making a robot, or what, all I know is that the server isn't telling me
I made a BAD REQUEST, ( i know what that looks like).
and here's my curl function. please help me if you have done this kind of
task before!!!
this is the the one I use to retrieve the initial page:
<?php
function get_with_cookies( $url )
{
$options = array(
CURLOPT_COOKIEJAR => '/home2/bridgep6/includes/cookiefile.txt',
CURLOPT_COOKIEFILE => '/home2/bridgep6/includes/cookiefile.txt',
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle compressed
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
rv:1.4) Gecko/20030624 Netscape/7.1 (ax)", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
// AND THIS IS THE MAIN FUNCTION i USE FOR EVERYTHING ELSE
function post_web_page( $url,$postfields )
{
$options = array(
CURLOPT_POST =>1,
CURLOPT_COOKIEJAR => '/home2/bridgep6/includes/cookiefile.txt',
CURLOPT_COOKIEFILE => '/home2/bridgep6/includes/cookiefile.txt',
CURLOPT_POSTFIELDS =>$postfields,
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle compressed
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.0;
en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
?>
thank you very much,
Ryan Pope
_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-php
Received on 2008-09-01