From: Erik van der Poel (erik@vanderpoel.org)
Date: Sat Feb 19 2005 - 19:37:55 CST
Peter Kirk wrote:
> On 19/02/2005 22:21, Erik van der Poel wrote:
>> ... Michel Suignard himself (long-time Unicoder) already admitted that:
>>
>> # No languages used in the former soviet union should require a mix of
>> # latin and cyrillic in a single dns label.
>> # Unicode contains many latin homographs in the Cyrillic block exactly
>> for
>> # that reason, to avoid mixing the two scripts in a single word. ...
>>
> Michel may have "admitted" this
I feel I have to apologize. Again. Not so much to Michel, but to the
other senior Unicoders, for my comment about backpedaling. They didn't
say what Michel did. Michel did. The others are not to blame.
> but it is nevertheless untrue in at least two ways:
>
> 1) Kurdish (Cyrillic), Udi and Wakhi are languages of the former Soviet
> Union which require a mix of Latin and Cyrillic within their ordinary
> orthography;
Yes, I read the first half or two thirds of the following:
http://ptolemy.tlg.uci.edu/~opoudjis/unicode/unicode_mixing.html
In particular, the discussion of "to conflate, or to disunify?" was
fascinating stuff. A very good read indeed.
> 2) A Russian, Alexander, who presumably knows the situation with his own
> language better than Michel does, has given a real example of how Latin
> and Cyrillic need to be mixed in Russian IDNs. And who is Michel to tell
> the Russians how to write their own language, or that they are not
> allowed to use international acronyms like XML within their IDNs?
Good point. I cannot speak for Michel here, but I strongly doubt that he
is trying to tell the Russians what to do.
> All that this shows is that there is no easy answer to the spoofing
> problem. At least, a simplistic ban on mixed scripts doesn't work. A
> confusables mapping might provide a solution, but I have seen no good
> suggestions on how this might be presented to an end user.
I have high hopes for Neil Harris' algorithm, involving looking for
strings that consist entirely of homographs, within a context where
those would not be expected. The feedback to the user could be to simply
leave those domain names in Punycode form. Hopefully, the user will look
at the domain name before typing in a credit card number.
Erik
This archive was generated by hypermail 2.1.5 : Sat Feb 19 2005 - 19:39:19 CST