cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: 'automatic configuration script' as proxy

From: Ralph Mitchell <rmitchell_at_eds.com>
Date: Tue, 02 Mar 2004 01:46:12 -0600

Attached are my notes on using Mozilla's javascript engine to process a
proxy autoconfig script. I haven't done anything with it since I wrote
this up in June '02, so it's possible things have changed (i.e. it may
now be broken).

It seems like you're hoping to get curl to actually parse and execute
javascript embedded in web pages. If that's the case, I think it's
going to take a *lot* of effort - at the very least you'll need to
replicate the data structures that a regular browser has for holding the
web page. For example, the javascript I commonly work with (against!)
refers to form elements within the html and one page I'm currently
trying to parse contains javascript functions using document.write() to
create a form on the page (depending on browser type), then there's
further javascript to process the form on submission. Another page I've
had to work with used a javascript form submission function to assemble
the form action url on the fly, depending on the contents of several for
fields.

OK, so you don't actually have to render the page as an image, but
you're talking about replicating a large fraction of a web browser.

And why stop there? What if the page you download refers to java
applets? Should curl support java internally?

I think you may be missing the point of curl, which is to go fetch stuff
from a web server (leaving out ftp, telnet, &c). As far as I can tell,
the closest curl gets to interpreting anything on a web page is: 1) to
follow redirects in the headers; 2) save and send cookies. It's not
supposed to be interactive or graphical.

Ralph Mitchell

Edward S. Peschko wrote:

>On Mon, Mar 01, 2004 at 07:49:25AM +0100, Daniel Stenberg wrote:
>
>
>>On Sun, 29 Feb 2004, Edward S. Peschko wrote:
>>
>>
>>
>>>I was wondering if curl supported an 'automatic configuration script' as do
>>>mozilla, IE, and opera...
>>>
>>>
>>Nope. Try FAQ entry 3.14: http://curl.haxx.se/docs/faq.html
>>
>>
>
>ok, so there's an entry in the FAQ which seems to set in stone the
>lack of a piece of functionality (which would be very useful btw) and
>then goes on to explain that there is a freeware javascript engine available,
>and that people have used it successfully in the past..
>
>If people have used it successfully in the past, are there instructions on how to
>use it successfully posted somewhere, and if so, could they be put there?
>
>
>
>>The last time I had a look, the (only free one I know) javascript parser code
>>was a bigger code chunk than curl itself.
>>
>>
>
>That's fine, I would make it a dependency then, like perl's DBD::DB2 depends on
>having db-4 installed. And then make the dependency a voluntary one (ie: if the
>javascript interpreter is there support a given command line flag, if not, don't).
>
>Don't get me wrong, I think curl is very useful, and I do use it for ftp.
>I just think that me and a hell of a lot of other people *can't* use it
>effectively if this is not supported.
>
>And since all the main browsers support javascript, more and more websites
>(not just company access methods which a lot of poeple use in themselves) will
>become inaccessible by curl and therefore curl will diminish in value.
>
>Ed
>
>

attached mail follows:


Anybody that wants to try it on their own .pac scripts:

    1) Fetch the javascript interpreter from Mozilla.org
(ftp://ftp.mozilla.org/pub/js). the current version is js-1.5-rc4a.tar.gz.

    2) Unpack the tar file, go into the js/src directory and build the
standalone javascript interpreter with: gmake -f Makefile.ref

    3) You'll find the interpreter under a subdirectory named like your platform
type (I guess). On my RedHat Linux box the path to it is:
js/src/Linux_All_DBG.OBJ/js

    4) The tricky part was finding the 12 builtin scripts. The file they're in
is called nsProxyAutoConfig.js, and it's somewhere in the Mozilla.org tree.
Sorry, I can't be more helpful with that - I found it via the search engine...
Approximately halfway down the file you'll find "var pacUtils =" followed by the
builtins. I extracted them to another file and removed the extraneous quotes
and \n"+ characters.

    5) Add your .pac file to the end of the builtins, add a further line that
looks something like this:

            print (FindProxyForURL("http://curl.haxx.se/", "curl.haxx.se"));

    6) Push the whole mess through the javascript interpreter:

            js/src/Linux_All_DBG.OBJ/js myproxy.pac

Depending on the URl you paste on the end, you should get "DIRECT", or
"PROXY proxy.yourdomain.com:80" or something similar.

OK, this is not the greatest hack in the world, but it kinda sorta works...
Something along the lines of:

    #!/bin/ksh

    curl -o proxy.pac http://proxy.mydomain.com/proxy.pac
    proxyornot=`echo "print (FindProxyForURL(\"http://curl.haxx.se/\",
\"curl.haxx.se\"));" | \
        cat builtins.pac proxy.pac - | js`

    if [ "$proxyornot" = "DIRECT" ]; then
        curl ......
    else
        # some other stuff to extract the proxy name from the variable and paste
it into
        # the curl call...

will tell you if you need to use your proxy or not. Of course, it would be
really nice to see this embedded in cURL... :) That's going to take some
effort, I think, though there are notes on Mozilla.org regarding embedding the
javascript interpreter into an application. I just haven't had time to pursue
that yet.

Ralph Mitchell

Daniel Stenberg wrote:

> [snip...] I guess the next step would be you telling us how to proceed to
> build this
> test setup you used to verify this. Then we need to incorporate this into
> libcurl in a suitable fashion.
>
> It would basicly be a little libcurl-using client within the library itself,
> that would fetch the .pac first, and pass it through SpiderMonkey to figure
> out the correct proxy setting and the proceed. The .pac file would then
> probably be cached internally and the proxy gets re-evaluated whenever a new
> URL is used.
>
> --
> Daniel Stenberg -- curl groks URLs -- http://curl.haxx.se/

-------------------------------------------------------
This sf.net email is sponsored by: Jabber Inc.
Don't miss the IM event of the season | Special offer for OSDN members!
JabConf 2002, Aug. 20-22, Keystone, CO http://www.jabberconf.com/osdn
Received on 2004-03-02