cURL / Mailing Lists / curl-users / Single Mail

curl-users

LiveHTTPHeaders and ieHTTPHeaders and deriving scripts from their log output -- was: (OT) Submit string

From: Jochen Hayek <Jochen+curl_at_Hayek.name>
Date: Fri, 09 Feb 2007 15:35:41 +0100

>>>>>> "ML" == Michel Lemieux writes:

    ML> I was wondering if it was possible to somehow "trap" the submit string of a web form.

    ML> The things is we have a web based application that I have to access fairly often.
    ML> I would really like to simply script the whole things and "submit" using curl.
    ML> But for this I need to know how those form parameters are parsed
    ML> and sent to the server.

    ML> Is that possible?

>>>>> "RM" == Ralph Mitchell writes:

> On 2/7/07, Peter Connell wrote:

    PC> Temporarily change the method from POST to GET and the browser will dump the
    PC> lot into the address line for you like magic, provided there are not so many
    PC> form elements as to blow the string limit for a GET.

    RM> If you have Firefox, then LiveHTTPHeaders works wonderfully.
    RM> It traps all the headers, both outgoing and incoming, even to
    RM> secure sites.

LiveHTTPHeaders is seriously a wonderful little helper.

If you use Internet Explorer for any reason,
then ieHTTPHeaders works wonderfully,
just like LiveHTTPHeaders for Firefox.

Because some sites insist on being frontended by IE,
at least *I* have to cope with both of them.

With "some" experience it looks very straightforward to derive
a program e.g. in perl using WWW::Curl::Easy
from the log output of ieHTTPHeaders and also of LiveHTTPHeaders.

Initially I did exactly that, although it was still somehow quite tedious ;-(

But guess what happened!

Not even two weeks after I got that script running,
which automated the access to that one web-site,
they changed some of the details on that web-site,
that my script depended on.
And *no* *chance* to keep them from constantly changing their web-sites.
So rather prepare yourself to adapt your script to such changes rather frequently,
if you seriously depend on such a web-site
and you want to continue running your software.

So wouldn't it be a nice idea,
to have some little software,
that does at least some of the dull work for you?
I mean reading the log output of ieHTTPHeaders resp. LiveHTTPHeaders
and writing that script simulating the user accessing a web-site rather automatically.

Yes, if you "meditate" a little, you realize,
I should rather say semi-automatically then plain automatically.
There are a lot of hidden fields and URLs in HTML bearing having so-called session-values,
that we have to extract and take into account for further processing.

Right, once you completed polishing that script you initially got generated,
you will not want to start that work again from scratch next time,
when they changed their web-site again "a little".
But you can still extract the relevant bits and pieces from the newly generated script, right?
Comparing the newly generated script to the earlier version of it helps a lot.

My generator script and the generated scrips deal well with "Location:", with "Referer:" and other details,
that the generator script finds in those log outputs,
and uses as much as possible to build assertions,
so our later processing will stop in a rather defined way
-- and you know it *will* stop once in a while,
as some of the assumptions built into the script will not hold any longer.
You want to know, that your script fails, as soon as possible, rather than later, don't you?
That eases fixing problems a lot.

Coming back to Michel Lemieux ...

Actually quite a couple of years I started with such "web interface scripts" built on curl itself,
but when a customer of mine had some work for me to do in that context (some months ago),
I rather reorganized my software to make use of libcurl instead.
That was a big step forward and made my software far more powerful.

I now have a couple of scripts retrieving CSV files from investment bank web-sites like Merrill Lynch and J.P.Morgan (for my asset management customer),
and for myself I run a few variants retrieving bank account statements.
If the data I am interested in is only available as HTML,
CSV-ish data is rather simply derived from that.

You will find, that software as my generator software is rather easily constructed.
The ideas behind it are seriously not rocket science.
If you should need help, you will find professional curl support through http://www.haxx.se/curl.html .
Of course I would love to be helpful as well.
European price levels apply ;-)
But maybe our friends from the Indian silicon valley offer quite some competence in this area, too, and lower and more affordable price levels as well.

Jochen Hayek
Received on 2007-02-12