cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Escaping URL's (was Japanese characters in URL)

From: Cris Bailiff <c.bailiff_at_awayweb.com>
Date: Tue, 08 May 2001 14:04:33 +1000

..... utf8 encoding elided...

>
> Hehe. No clue....
>
> What I'm more thinking is if someone sets CURLOPT_URL (I know very little
> about the command line tool. ;) to:
>
> http://www.google.com?search=Johnny Carson's Last %
>
> That the url get automagically encoded by cURL...

Not automagically please - I don't want curl to manipulate the URL
unless I say so... Its good/fine to have a function (curl_escape?) which
normalises URLS to canonical/legal form, but if I've already done that
to my URL, it's wrong/broken to do it again. The problem is that you
don't know if a URL is escaped or unescaped, and can't tell by looking.

E.g.

http://www.google.com?search=Johnny Carson's Last %

should obviously be escaped as:

http://wwww.google.com?search=Johnny%20Carson%27sLast%20%25

but if curl did this automa(tg)ically, what happens when I give it this
URL?

http://wwww.google.com?search=Johnny%20Carson%27sLast%20%25

Does it escape it or not? Why should it, its already correct, but how do
you know? If it come out as:

http://wwww.google.com?search=Johnny%2520Carson%2527sLast%2520%2525

then you broke it....

Curl could have a switch to say 'escape/normalize this url', but it
should default to off.

If you want to turn multibyte characterts into escaped strings, the same
switch might suffice, but I suspect what you really want is a
mutlibyte->UTF-8 normalizer, followed by selective escaping of the utf-8
bytes. Remember, the URL might already have other (legal) %-encoded
strings, so you can't just run the general escaping switch over the
string - only over your newly normailzed binary characters, otherwise
you'll double escape the '%20's etc again...

Cris
Received on 2001-05-08