cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Japanese characters in URL

From: Sterling Hughes <sterling_at_designmultimedia.com>
Date: Sun, 6 May 2001 15:19:52 -0400 (EDT)

On Mon, 7 May 2001, Daniel Stenberg wrote:

> On Sat, 5 May 2001, Max WebMaster wrote:
>
> > I want to download some pages with curl off a Japanese site. Via browser
> > it is no problem (I have Japanese fonts installed) but curl naturally
> > wants to send in the link the double-byte expression for each character.
> > The web site does not understand it and refuses the connection.
>
> RFC2396 (http://curl.haxx.se/rfc/rfc2396.txt) details how URLs are to be
> written. Curl itself performs no magic on the input string but expects it to
> be correct.
>
> Now, I'm not really up to speed with how localized strings are supposed to
> work in URLs (I figure they're UTF8'ed at some point to hide this fact to
> lower layers).
>
> You need to enter all special characters as '%[2-digit-code]' as the section
> 2.1 in RFC2396 describes:
>
> For original character sequences that contain non-ASCII characters,
> however, the situation is more difficult. Internet protocols that
> transmit octet sequences intended to represent character sequences
> are expected to provide some way of identifying the charset used, if
> there might be more than one [RFC2277]. However, there is currently
> no provision within the generic URI syntax to accomplish this
> identification. An individual URI scheme may require a single
> charset, define a default charset, or provide a way to indicate the
> charset used.
>
> It is expected that a systematic treatment of character encoding
> within URI will be developed as a future modification of this
> specification.
>
> > What now? How do I encode Japanese character into an URL string?
>
> If no one else around can provide info on this subject, I'd recommend that
> you "spy" on the request sent by your browser and clone that string to use
> with curl!
>
>

Perhaps a function should be added to cURL to urlencode these strings
(either as an option or by default)... We have such a function in the PHP
source that could easily be ported to cURL (Its a pretty standard
routine, and I'll do it if you want). It wouldn't affect existing code
(ie, it won't improperly munge correctly encoded URL's), and might be of
convience to users...

-Sterling
Received on 2001-05-07