From: Hans Aberg (haberg@math.su.se)
Date: Sat Feb 19 2005 - 13:05:21 CST
At 21:54 -0800 2005/02/18, Doug Ewell wrote:
>Keld Jˆ½rn Simonsen <keld at dkuug dot dk> wrote:
>
>> Of cause we should minimize the risks for internet users of being
>> mislead. This could be done by equializing similar characters,
>> like Latin, Cyrillic and greek A, 0 and O, 1 and 1 etc, so that no
>> visual misleading should be possible.
>
>OK, now fill in the "et cetera," now that you've got the obvious ones
>out of the way.
>
>As long as Erik has already mentioned it (thank you very much), see my
>post from 3 years ago to see how this task quickly goes from simple to
>tricky to impossible:
>
>http://ops.ietf.org/lists/idn/idn.2002/msg00498.html
If one does it that way, one quickly gets into trouble. But one defines a
map, which merges some characters for separating IDN's, while retaining the
original Unicode character set on the user level on the input. Take a
character set C, which might be a subset of Unicode, and send the Unicode
characters (or a suitable subset thereof) into the set of finite sequence of
C. Two IDN's will be declared equal if mapped to the same character
sequence. This map is only used define which IDN's are viewed as equal. But
one is still free to use whatever Unicode sequences one wants. The map is
used to define an equivalence relation on the set of Unicode character
sequences, but does not in itself affect which Unicode sequences which are
admissible.
This is in fact a common math method: If two objects need to be separated,
define a map which separates them. If many objects need to be separated, one
may use several maps, used as a set product codomain function, until the
properties one wants to be captured are separated. So if one defines a map
above which in other circumstances fail to separate the characters the way
one wants to, define another map.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Sat Feb 19 2005 - 13:35:06 CST