cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: 'automatic configuration script' as proxy

From: Edward S. Peschko <esp5_at_pge.com>
Date: Tue, 2 Mar 2004 11:51:10 -0800

On Tue, Mar 02, 2004 at 01:46:12AM -0600, Ralph Mitchell wrote:
> Attached are my notes on using Mozilla's javascript engine to process a
> proxy autoconfig script. I haven't done anything with it since I wrote
> this up in June '02, so it's possible things have changed (i.e. it may
> now be broken).
>
> It seems like you're hoping to get curl to actually parse and execute
> javascript embedded in web pages. If that's the case, I think it's
> going to take a *lot* of effort - at the very least you'll need to
> replicate the data structures that a regular browser has for holding the
> web page. For example, the javascript I commonly work with (against!)
> refers to form elements within the html and one page I'm currently
> trying to parse contains javascript functions using document.write() to
> create a form on the page (depending on browser type), then there's
> further javascript to process the form on submission. Another page I've
> had to work with used a javascript form submission function to assemble
> the form action url on the fly, depending on the contents of several for
> fields.

yes, that's part of what I'm thinking, although I think that you are
misunderstanding the scope of what I'm looking at:

there's a button in opera, mozilla and ie - 'use automatic configuration script'.

As far as I can see, this button lets you define a function FindProxyForURL
which takes two arguments - url and host.

It then returns either 'DIRECT' or 'PROXY ....' depending on which host and url
you are at and trying to reach, to a given host based on the contents of that
funciton, which may or may not take authentication depending on what the proxy
sends back.

So, IMO, the scope of the problem is limited - in order to support this you don't
need to support all the uses and execution of javascript, just the execution
of one function which returns a string.

Of course, people could as a matter of course do incredibly fancy things with
their javascript, in forwarding to a proxy, but they don't really do this with
the automatic proxy configuration script. In the admittedly small sample size
that I have had access to (4 sites which used this) they've all had a bunch
of if-then statements based on url and host to determine how to route it.

> OK, so you don't actually have to render the page as an image, but
> you're talking about replicating a large fraction of a web browser.
>
> And why stop there? What if the page you download refers to java
> applets? Should curl support java internally?

I think you are trying to increase the scope of the problem unnecessarily.
The stuff that you refer to don't hamper my ability to get documents off
the web in the first instance, ie: it doesnt make curl unusable for what it intends
to do.

The fact that curl doesn't handle automatic proxy configuration - which is standard
in all browsers - *does*. That's why it should support it - its directly
in the problem domain that curl sets up for itself.

In other words:

        1) I want to set up automatic http:... of a certain piece of code that I found
           on slashdot or automatically retrieve a url from yahoo.

        2) I can't use curl because of this authentication method that curl doesn't
           support.

I can hack around it and get something that sort of works, but I don't want to do
that. I want my solutions to be portable to other users environments. I'm sick
of unix - and automation - being a second class citizen because of this, I don't want
to have to code wrappers around it to have it work. I just want to be able to
do it transparently.

Anyways, I'll take a look at your notes, and maybe hack something for the time
being. But I hope that what I'm saying doesn't get ignored, or swept under.
Like I said, I'm not the person to code this in the first instance, but I can
give testing and porting support.

Ed

(
ps - I'm not categorically opposed to a -javarun flag or somesuch in the case of
java attachments or a -jsrun flag in the case of javascript; they would be exceedingly
useful in themselves, but like you said, they aren't as basically necessary as supporting
auto proxy config.

although the number of times I've hit my head when I've come across a site that I
can't automate because it contains java/javascript is fairly high..
)
Received on 2004-03-02