Re: Japanese characters in URL
Date: Sun, 6 May 2001 15:19:52 -0400 (EDT)
On Mon, 7 May 2001, Daniel Stenberg wrote:
> On Sat, 5 May 2001, Max WebMaster wrote:
> > I want to download some pages with curl off a Japanese site. Via browser
> > it is no problem (I have Japanese fonts installed) but curl naturally
> > wants to send in the link the double-byte expression for each character.
> > The web site does not understand it and refuses the connection.
> RFC2396 (http://curl.haxx.se/rfc/rfc2396.txt) details how URLs are to be
> written. Curl itself performs no magic on the input string but expects it to
> be correct.
> Now, I'm not really up to speed with how localized strings are supposed to
> work in URLs (I figure they're UTF8'ed at some point to hide this fact to
> lower layers).
> You need to enter all special characters as '%[2-digit-code]' as the section
> 2.1 in RFC2396 describes:
> For original character sequences that contain non-ASCII characters,
> however, the situation is more difficult. Internet protocols that
> transmit octet sequences intended to represent character sequences
> are expected to provide some way of identifying the charset used, if
> there might be more than one [RFC2277]. However, there is currently
> no provision within the generic URI syntax to accomplish this
> identification. An individual URI scheme may require a single
> charset, define a default charset, or provide a way to indicate the
> charset used.
> It is expected that a systematic treatment of character encoding
> within URI will be developed as a future modification of this
> > What now? How do I encode Japanese character into an URL string?
> If no one else around can provide info on this subject, I'd recommend that
> you "spy" on the request sent by your browser and clone that string to use
> with curl!
Perhaps a function should be added to cURL to urlencode these strings
(either as an option or by default)... We have such a function in the PHP
source that could easily be ported to cURL (Its a pretty standard
routine, and I'll do it if you want). It wouldn't affect existing code
(ie, it won't improperly munge correctly encoded URL's), and might be of
convience to users...
Received on 2001-05-07