Buy commercial curl support from WolfSSL. We help you work
out your issues, debug your libcurl applications, use the API, port to new
platforms, add new features and more. With a team lead by the curl founder
himself.
Re: option to disallow IDN ?
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: Timothe Litt <litt_at_acm.org>
Date: Fri, 16 Dec 2022 14:58:20 -0500
On 16-Dec-22 14:19, Dan Fandrich via curl-library wrote:
> On Fri, Dec 16, 2022 at 01:18:12PM -0500, Timothe Litt via curl-library wrote:
>> And/or the callback registration could specify "all domain names", "Just IDN" -
> The browsers (at least Firefox) do something subtle but pretty useful for
> avoiding spoofing. Based on the name registration policies of the TLD being
> used, they either show the IDN as expected in the URL bar, or just show the
> ugly punycode version of the name. TLDs with policies that forbid names that
> could lead to confusion (homographic attacks) get the desired behaviour (of
> seeing the IDN name) but those without policies, or with policies that could
> lead to confusion get the punycode version, making it obvious that some
> spoofing may have gone on to get you to that web page. Mozilla's original
> policy can be seen here:
> https://www-archive.mozilla.org/projects/security/tld-idn-policy-list
>
> They've amended that policy since to allow displaying IDN in some cases even on
> those TLDs with bad or nonexistent policies. This only happens if all the
> characters in the TLD come from the same script. If a TLD mixes, for example,
> Cyrillic and Latin characters, it's displayed as punycode, but all Cyrillic is
> shown in all its UNICODE glory. The idea is that people (who can read that
> script) will recognize the different characters within that script and be able
> to tell them apart, and there won't be any mixing of similar-looking characters
> within a single domain name. That policy can be seen at
> https://wiki.mozilla.org/IDN_Display_Algorithm
>
> Lots of thought has been given to this problem already (Mozilla seems to have
> implemented the first policy 17 years ago), and curl could take advantage of
> that. But, since it's not a browser it can't use the same means of notifying
> the user (displaying punycode in the URL bar), but some viable alternatives
> to that have already been brought up here.
>
> Dan
As you say, curl isn't a browser. And hardcoding the TLDs' policies
seems like whack-a-mole.
A simple callback function in the library to pass on any domain name
would be fairly cheap, and would allow any policy. It's up to the UI to
decide how to handle issues. curl could provide a sample policy such as
the one I outlined. Perhaps the policy could be a loadable DLL, e.g.
host-name-filter=idn-alias loads idn-alias.{so,exe,...)
I think curl is best at being a tool, not a policy engine. So making
policies pluggable seems in line with the philosophy; hard-coding
anything more involved than a default (and generally agreed-upon) list
of homographic characters that triggers a warning unless whitelisted
doesn't.
Unlike browsers, curl usually doesn't wander the web at random or follow
search engine results or e-mailed links. If you setup or use a curl
command, the risk seems a lot less - you have to think. A warning is
a good safety net - but if you're in Japan, clearly you don't want to be
warned about every local website. Then you get into the same automatic
'i'm annoyed, just say yes' syndrome seen with self-signed
certificates. Thus, a whitelist...
So I think that the curl command could reasonably provide a simple 'warn
on IDNs with risky characters' along with a whitelist. I don't think
that trying to replicate the UI of a browser with complex policies is
worthwhile (or feasible). A filter function hook in the library would
allow experimentation and arbitrarily complex policies. And making the
function loadable (in the command line tool), decouples the policy from
curl proper.
Along those lines, you could imagine a filter policy that reads a file
of regexes, one that uses Mozilla's code to decide, simple white/black
lists, or ... anything you can imagine. And if it's in a dell selected
by an option in .curlrc, it's pretty painless for the user. Plus,
any other application that uses the library can link with curl's sample
policy if it suits the application.
Anyhow, that's my 3 cents. I don't think a ban on IDNs is useful. I do
think a flexible policy is required, and that the policy should be
customizable and isolated from the mechanism. The suggested 'is this
name ok' hook in the library does this at minimal cost. (The default
function can be 'return true'... or the call skipped if no function
registered.)
Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.
Received on 2022-12-16
Date: Fri, 16 Dec 2022 14:58:20 -0500
On 16-Dec-22 14:19, Dan Fandrich via curl-library wrote:
> On Fri, Dec 16, 2022 at 01:18:12PM -0500, Timothe Litt via curl-library wrote:
>> And/or the callback registration could specify "all domain names", "Just IDN" -
> The browsers (at least Firefox) do something subtle but pretty useful for
> avoiding spoofing. Based on the name registration policies of the TLD being
> used, they either show the IDN as expected in the URL bar, or just show the
> ugly punycode version of the name. TLDs with policies that forbid names that
> could lead to confusion (homographic attacks) get the desired behaviour (of
> seeing the IDN name) but those without policies, or with policies that could
> lead to confusion get the punycode version, making it obvious that some
> spoofing may have gone on to get you to that web page. Mozilla's original
> policy can be seen here:
> https://www-archive.mozilla.org/projects/security/tld-idn-policy-list
>
> They've amended that policy since to allow displaying IDN in some cases even on
> those TLDs with bad or nonexistent policies. This only happens if all the
> characters in the TLD come from the same script. If a TLD mixes, for example,
> Cyrillic and Latin characters, it's displayed as punycode, but all Cyrillic is
> shown in all its UNICODE glory. The idea is that people (who can read that
> script) will recognize the different characters within that script and be able
> to tell them apart, and there won't be any mixing of similar-looking characters
> within a single domain name. That policy can be seen at
> https://wiki.mozilla.org/IDN_Display_Algorithm
>
> Lots of thought has been given to this problem already (Mozilla seems to have
> implemented the first policy 17 years ago), and curl could take advantage of
> that. But, since it's not a browser it can't use the same means of notifying
> the user (displaying punycode in the URL bar), but some viable alternatives
> to that have already been brought up here.
>
> Dan
As you say, curl isn't a browser. And hardcoding the TLDs' policies
seems like whack-a-mole.
A simple callback function in the library to pass on any domain name
would be fairly cheap, and would allow any policy. It's up to the UI to
decide how to handle issues. curl could provide a sample policy such as
the one I outlined. Perhaps the policy could be a loadable DLL, e.g.
host-name-filter=idn-alias loads idn-alias.{so,exe,...)
I think curl is best at being a tool, not a policy engine. So making
policies pluggable seems in line with the philosophy; hard-coding
anything more involved than a default (and generally agreed-upon) list
of homographic characters that triggers a warning unless whitelisted
doesn't.
Unlike browsers, curl usually doesn't wander the web at random or follow
search engine results or e-mailed links. If you setup or use a curl
command, the risk seems a lot less - you have to think. A warning is
a good safety net - but if you're in Japan, clearly you don't want to be
warned about every local website. Then you get into the same automatic
'i'm annoyed, just say yes' syndrome seen with self-signed
certificates. Thus, a whitelist...
So I think that the curl command could reasonably provide a simple 'warn
on IDNs with risky characters' along with a whitelist. I don't think
that trying to replicate the UI of a browser with complex policies is
worthwhile (or feasible). A filter function hook in the library would
allow experimentation and arbitrarily complex policies. And making the
function loadable (in the command line tool), decouples the policy from
curl proper.
Along those lines, you could imagine a filter policy that reads a file
of regexes, one that uses Mozilla's code to decide, simple white/black
lists, or ... anything you can imagine. And if it's in a dell selected
by an option in .curlrc, it's pretty painless for the user. Plus,
any other application that uses the library can link with curl's sample
policy if it suits the application.
Anyhow, that's my 3 cents. I don't think a ban on IDNs is useful. I do
think a flexible policy is required, and that the policy should be
customizable and isolated from the mechanism. The suggested 'is this
name ok' hook in the library does this at minimal cost. (The default
function can be 'return true'... or the call skipped if no function
registered.)
Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.
-- Unsubscribe: https://lists.haxx.se/listinfo/curl-library Etiquette: https://curl.se/mail/etiquette.html
- application/pgp-signature attachment: OpenPGP digital signature