cURL / Mailing Lists / curl-library / Single Mail

curl-library

RE: Libcurl suggestions

From: Daniel Stenberg <daniel-curl_at_haxx.se>
Date: Tue, 9 Dec 2003 14:03:41 +0100 (CET)

On Mon, 8 Dec 2003, DOMINICK C MEGLIO wrote:

[CURLOPT_URLENCODE]

> > I'm against adding this feature because it misleads people into
> > believing libcurl will do the right thing for them
>
> I can understand what you're saying here. The problem is, as a client coder,
> I'd rather spend my time working on my application, not on parsing and
> dealing with a URL a user inputs. I'm not sure how exactly this should be
> dealt with, it just seems like a problem to me.

In my view, this is very similar to your request about getting the file name
from a URL. You want to create URLs and get information from existing URLs,
without having that knowledge in your code.

That certainly requires some code/library that knows these things. I don't
think this is a job for libcurl.

[CURLOPT_DISABLEPROTOCOLS]

> > I figure this would need to be bits set in curl_global_init() for this to
> > be really useful?
>
> Well I did it differently (patch attached). I made curl_easy_setopt(curl,
> CURLOPT_DISABLEPROTOCOLS, CURLPROT_LDAP|CURLPROT_DICT); for example.

Well, having it like this makes it impossible for us to add a function that
would return weather the protocol is supported or not, without forcing that
function to require an easy handle. It feels a bit odd.

> Now of course I could simply do some string comparisons to test if it starts
> with "telnet://" or whatever, but I figured, the library is already
> determining which protocol it is, why not let the library determine if it
> should be disabled? So that's exactly what I did.

I think I'd rather have libcurl offer basicly the same functionality using a
different approach:

We add a function - curl_checkproto (yes, we should discuss the actual name of
it) that returns the protocol of the given URL, if libcurl supports it. And it
returns protocol-is-unsupported if not. It would also help applications
realize which protocol libcurl uses when you don't specify the protocol part
and libcurl takes a guess (using "www.foobar.com" etc).

Then, you know which protocol that is used and you can decide if you want to
let libcurl transfer that URL or not. In your case, you'd explicitly forbid a
few known protocols per above, and let newly added ones proceed.

[curl_getfilename]

> > Why settle with getting just the file name? What about host name? Port
> > number? Protocol? Password? etc etc... (and then someone comes up with the
> > brilliant idea of having functions that let you set one of those fields)

> Well I wouldn't settle for just the filename. The way I'm planning to do it
> is move curl's URL parsing code to a Curl_parse and then expose a
> curl_urldata_get and curl_urldata_free. And you're right, someone would come
> up with such functions to set those things, who would this person be you
> ask? Well it's you! CURLOPT_USERPWD for example lets you set the username
> and password. As far as I can tell that works the same as including the
> user:password right in the URL.

Yes and no.

1 - Setting CURLOPT_USERPWD does not modify any URL, so you cannot set the
    username+password and then see how that modified the URL.

2 - CURLOPT_USERPWD actually overrides the existing username+password in the
    URL, so the option is an addition to the info in the URL.

3 - If no username+passwors is set in the URL, then yes, setting
    CURLOPT_USERPWD is an exact equivalent to setting them in embedded in the
    URL.

Further, as you might already know, the current libcurl doesn't parse the URL
until you call curl_easy_perform(). (This isn't an argument of either way, it
is just information.)

> I can understand why you don't want such a feature, but I'm sure you also
> know that many users would definately value such a feature.

Possibly, but I have more things to consider in this than anyone else has. I'm
not only the one who most likely get to maintain this code in the future, I
also need to put my foot down at times to make sure libcurl doesn't slowly
trespass into areas where it doesn't belong. I've previously denied URL
modifying and extraction functions to enter, and I still haven't been
convinced that libcurl needs to offer this functionality. The URL encoding and
decoding functions are just about enough in my view.

There are perfectly fine URL/URI libraries already, and if there isn't one
that suits you I'm sure you can make one pretty swiftly that will.

> If you don't think implementing this is useful, would you at least consider
> an "effective filename" lookup? What I mean by this is, the filename I
> specify isn't necessarily the file I get. If I go to
> blah.com/download.php?file=2 download.php isn't the file I get, either using
> Location: or Content-Disposition: it's very likely that I'll be redirected
> to another file, maybe somedl.txt. From what I can tell, there is no way for
> me to discover that the file I actually want to save locally is somedl.txt.

Surely there is. You can parse all those incoming headers and figure out what
you want to call your target file. libcurl has no concept of "file names" and
doesn't care one bit what you do with the info it provides. It would be weird
to add a special-case in libcurl for this particular info. Wouldn't it?

> > you can use one of the existing URI/URL parsing libraries "out there". You
> > know the full URL yourself.

> It seems wasteful to me to include a library so that I can use 1 function.

Only if it would be a wasteful library. I can't see how a URL parsing library
would be wasteful if that's what you want. Yes, it would duplicate some of
libcurl's code, but to make libcurl able to properly export this information
you'd need to add a good chunk of new code anyway.

> Not to mention you again run into the possibility of inconsistencies, again
> my fictional fake:// URL scheme. In 7.20.0 libcurl supports fake://. Say I
> use libwww to parse the URL. Well my program was designed for libcurl 7.11.0
> and libwww 5.4.0. Neither of these versions support fake://. But now the
> user notices libcurl 7.20.0 was released and upgrades it. The user also
> upgrades libwww to 7.0.0. What happens if libwww 7.0.0 doesn't support
> fake://? Well it means my program can't load fake:// URLs because I have no
> way to parse them. So even though libcurl 100% has the means to work with
> fake:// urls, my program can't until libwww is updated to support fake://.

1. We don't add entirely new URL schemes to libcurl *that* often. (I just
   checked, the last time it happened was March 2001.)

2. You would of course have a basic default approach used if the parser
   library can't get a file name from the URL

> Anyway, I realize you wouldn't make such a feature yourself, but does that
> mean you'd also reject it if I were to code it and submit it as a patch? If
> so I won't waste my time.

I will not accept patches that introduce "URL fiddling", no. Sorry.

> > One of the ideas behind the concept of URLs is that they look and work the
> > same, independent of the underlying protocol. Thus, you should be pretty
> > safe to assume that no such big surprises will pop up even in future
> > versions of libcurl.
>
> That may be the theory, but it isn't like that in practice. Take for example
> the draft for the irc:// URL
> <http://www.ietf.org/internet-drafts/draft-butcher-irc-url-03.txt>

[...]

> certainly doesn't look like an http/ftp URL to me. Different protocols have
> different needs and therefore the URL scheme has to be adapted in some ways
> to accomodate those needs.

If a future libcurl would introduce support for that (which seems pointless
indeed), I would say that the "file name" returned for this URL probably would
be the part on the right side of the last slash.

Even if that wouldn't be the case, I don't think the risk of inconsistency is
enough to warrant such code to get added to libcurl.

-- 
    Daniel Stenberg -- http://curl.haxx.se/ -- http://daniel.haxx.se/
   [[ Do not send mails to this email address. They won't reach me. ]]
-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
Received on 2003-12-09