From: Doug Ewell (dewell@adelphia.net)
Date: Sat Feb 19 2005 - 15:17:56 CST
Hans Aberg <haberg at math dot su dot se> wrote:
>>> Of cause we should minimize the risks for internet users of being
>>> mislead. This could be done by equializing similar characters,
>>> like Latin, Cyrillic and greek A, 0 and O, 1 and 1 etc, so that no
>>> visual misleading should be possible.
>>
>> OK, now fill in the "et cetera," now that you've got the obvious ones
>> out of the way.
>>
>> As long as Erik has already mentioned it (thank you very much), see
>> my post from 3 years ago to see how this task quickly goes from
>> simple to tricky to impossible:
>>
>> http://ops.ietf.org/lists/idn/idn.2002/msg00498.html
>
> If one does it that way, one quickly gets into trouble. But one
> defines a map, which merges some characters for separating IDN's,
> while retaining the original Unicode character set on the user level
> on the input. Take a character set C, which might be a subset of
> Unicode, and send the Unicode characters (or a suitable subset
> thereof) into the set of finite sequence of C. Two IDN's will be
> declared equal if mapped to the same character sequence. This map is
> only used define which IDN's are viewed as equal. But one is still
> free to use whatever Unicode sequences one wants. The map is used to
> define an equivalence relation on the set of Unicode character
> sequences, but does not in itself affect which Unicode sequences which
> are admissible.
That isn't the point. It doesn't matter if the mapping takes place at
the character-encoding level or at some other level. The problem of
determining which pairs of characters are confusable and should be
folded together, and which pairs aren't and shouldn't, still remains to
be solved.
The point is that there is a large set of "semi-confusable" characters,
for which it simply cannot be said conclusively that they "look alike"
or "don't look alike." It depends on font, size, medium (paper vs.
screen), and sometimes context. This is one of those problems for which
a partial solution simply isn't good enough.
And for something like IDNs, once you have decided on a mapping, you can
never, ever change it. Otherwise you will have a domain name available
for registration by customer A today, but a similar one not available to
customer B six months later (or vice versa, A can't get it but B can).
Either way, you have a lawsuit.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Sat Feb 19 2005 - 15:20:04 CST