curl-and-python

trouble filling out a complex form--it can't find the submit button?

From: James Webber <jamestwebber_at_gmail.com>
Date: Fri, 27 Mar 2009 16:08:26 -0400

We have a server application with a cgi interface (users submit jobs, with
data files, using a web form). We're trying to interact with that interface
through our own program, so I was trying to use pycurl to do that.

The form is fairly complex (as in, a mixed bag of different controls--text
boxes, selections from list, etc), and it includes a file upload, which is
typically very large.

I was trying to get a prototype to submit a simple job, but I couldn't
figure it out. Code follows:

# a wrapper class that makes adding entries easier
# I got this code while searching for a solution, but I can't find
# the address right now
class CurlForm(list):
    def add_string(self, name, str, type=None):
        options = [pycurl.FORM_CONTENTS, str]
        self.__add_optional(options, pycurl.FORM_CONTENTTYPE, type)
        self += (name, tuple(options)),

    def add_file(self, name, file, type=None, filename=None):
        assert file is not __builtins__.file

        options = [pycurl.FORM_FILE, file]
        self.__add_optional(options, pycurl.FORM_CONTENTTYPE, type)
        self.__add_optional(options, pycurl.FORM_FILENAME, filename)
        self += (name, tuple(options)),

    @staticmethod
    def __add_optional(list, flag, val):
        if val is not None:
            list += flag, val

# Handle to libcurl object
crl = pycurl.Curl()

# ... go to login page, log in. This part works fine ...

fname = # ... file name to upload

searchform = CurlForm()

search_form_seq = [ # ... parameters for the form ...
                   ('edit[PARAMNAME]', 'value'), # is this the right format?
trying to enter values in the form
                   ('edit[ANOTHER]', 'value2')] # some of these form entries
are list-boxes, not text
... etc ...

for (n,s) in search_form_seq:
    searchform.add_string(n,s)

searchform.add_file('edit[FILENAME]', fname) # add the file name.

# possible issue here: the submit button doesn't have a 'name' field. it has
a type. can pycurl find it?
searchform.add_string('submit','Start Search ...')

crl.setopt(pycurl.VERBOSE, 1)
crl.setopt(pycurl.URL, search_url)
crl.setopt(pycurl.HTTPPOST, searchform)

crl.perform()

print crl.getinfo(pycurl.HTTP_CODE), crl.getinfo(pycurl.EFFECTIVE_URL)

crl.close()

This code runs without errors, but it doesn't seem to _do_ anything. I log
in and get to the search page, but it doesn't submit the search to the
server. The output from the perform is the source of the search page itself,
which suggests to me that it didn't actually go anywhere.

I thought the issue might be that it wasn't finding the submit button
(because that button has no name), but adding a name didn't help. That
wouldn't be a viable solution anyway, as we need this application to work
without monkeying on the server side.

I guess I'm unclear on the correct format for all these form inputs...is
"edit[FILENAME]" the right way to do it?

Ideally, I'd like to be able to actually read the form and build a GUI for
the user (much like the GUI they find on the server's page), but I didn't
see any specific reading methods in pycurl. I suppose I could write a parser
for the page source, if necessary.

It's possible pycurl simply isn't the right tool for this job--I looked at
the variety of different http-interface options out there, and pycurl seemed
to be the most robust, but I don't know if it handles this kind of thing.

thanks for any ideas,
 - James

_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2009-03-27