cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: libcurl and IDNA

From: Gisle Vanem <gvanem_at_broadpark.no>
Date: Wed, 7 Apr 2004 13:59:07 +0200

"Daniel Stenberg" <daniel-curl_at_haxx.se> said:

> > What would happen in curl now if one enters some IDN in some east-asian
> > encoding? I guess it would break in sscanf() etc. (but maybe UTF-8 works?)
>
> I haven't paid enough attention on how the URL would be formatted in these
> cases. You have any examples?

No, I don't know how they would enter an IDN (except via browsers that handles
UTF-8 etc). I assume most charsets are supersets of US-ASCII. So entering a
normal hostname on those OS' should be no problem.

And Win-XP (or Internet Explorer?) seems to support non-ASCII names, but will
send a DNS query for the UTF-8 encoded name. There's only AFAIK one draft for
this
http://public.research.mimesweeper.com/Standards/IETF/Draft/draft-skwan-utf8-dns-02.txt

But support for UTF-8 needs upgrading the DNS servers, while IDNA
does not. So IDNA/RFC-3490 will surely be deployed faster and supported by
most registrars.

> 'www.tromsų.no' is not really a host name I can use with curl on my Linux box
> since the resolver refuses to resolve it to an IP address:
>
> curl: (6) Couldn't resolve host 'www.tromsų.no'
>
> www.xn--troms-zuA.no works though.

Ah, I forgot I had "195.159.151.136 www.tromsų.no" in my hosts file.
'curl http://195.159.151.136' and 'curl http://www.xn--troms-zuA.no'
should give different pages.

> I think we are gonna see quite a lot of domains using non-ASCII characters
> within shortly, so I think getting curl to work with them is a rather
> important task.

I agree. I need to identify those places where domain-name should be
converted to/from ACE form. AFAICS most is done in CreateConnection().
Is conn->gname the only place a domain-name is stored? And does
conn->hostname always points to conn->gname[] ? If so, it would probably
be easiest to do:
  rc = idna_to_ascii_lz (conn->gname, &conn->hostname, 0);

and then when conn->hostname != conn->gname assume it's ACE
encoded.

--gv
Received on 2004-04-07