cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Fwd: Bug#311112: curl: Escape <space> in HTTP requests

From: Daniel Stenberg <daniel-curl_at_haxx.se>
Date: Mon, 6 Jun 2005 23:13:18 +0200 (CEST)

On Sat, 4 Jun 2005, Andreas Metzler wrote:

(now only on the users list, this is not a libcurl issue)

Thanks for getting back on the subject and being persistent. Discussions and
opinions are vital!

>> Having curl translate illegal letters into something else would take
>> a lot of effort and "intelligence" in the tool I much rather avoid.
>> Is translation to %20 *always* wanted when you pass a space in the
>> URL? I doubt that.
>
> Hello, I do not know the respective RFCs to answer this question correctly,
> however I'd _guess_ that at least in a HTTP-URL 's/ /%20/g' is always the
> correct thing to do. (That is whhat e.g. mozlla does.)

I beg to differ. For example, if the URL would look like this:
"http://siteone.com http://sitetwo.com" I would rather suspect that the space
was meant to be a URL separator and not %20.

Besides, many servers allow spaces in the GET requests fine, so there are
users who want to send plain spaces in their requests.

The only thing the "respective RFCs" would tell you, is that space (hex 20,
decimal 32) is invalid within a URL and thus you cannot use it. A URL with a
space is therefore an oxymoron and cannot happen. So we need to make a
decision on how to deal with them.

I can think of three different approach to the "space problem" (it applies to
numerous other illegal URL characters as well):

1) pass on what you ask it, as far as possible (the current approach)

2) reject all illegal URLs

3) translate illegal URL into the supposedly wanted version

I picked approach 1 ages ago because I think this is the way for curl. A
rather low-level URL-based tool. I still think so. Of course I welcome
discussions and opinions on what approach to use.

Of course we could add an option for it. See further below.

>> Also (of course) different servers and different protocols will treat that
>> literal space differently.
>
> I was trying to comment only on HTTP(S). ;-)

Yes, and I widened the perspective since URLs are not HTTP alone and neither
is curl. In fact, curl doesn't even know what type it is, it just passes it on
to libcurl!

> I /thought/ curl was supposed to not require intrisinic knowledge on HTTP
> (like e.g. netcat).

How did certain basic knowledge of the URL format become "intrisinic knowledge
on HTTP" ?

> FWIW wget does what I suggested.

Sure, Mozilla and wget do things differently than curl. In fact, they do all
sorts of things on the given URL. It is not in itself an argument that
convinces me that curl should do like they do.

> - This is not intended as a "If you don't do what I want you to, I'll switch
> to $alternative_project" but eally ust as as a point of information from a
> users POV, "${alternative, similar project} works this way, so it might be
> safe".

I'm not so sure about that. At least one of the mentioned comparison tools do
things on the given URL that some operations just isn't possible with it, but
is is with curl.

I guess this might be because we have different goals and ideas in mind when
we make decisions on how to act on certain input.

> Of course the behavior I sugest would need to be made default, I'd be
> entirely happy with --escape-disallowed-characters-in-http-requests.

I wouldn't mind seeing such an option.

I suppose it would make even more sense if it was made an option that converts
everything non-URL into its %HH-coded version and not only space. Like
--url-sanitise or something.

Anyone up to writing such a patch?

-- 
  Commercial curl and libcurl Technical Support: http://haxx.se/curl.html
Received on 2005-06-06