Re: [idn] IDN spoofing

From: Erik van der Poel (erik@vanderpoel.org)
Date: Sat Feb 19 2005 - 19:37:55 CST

  • Next message: Asmus Freytag: "Re: orthographies"

    Peter Kirk wrote:
    > On 19/02/2005 22:21, Erik van der Poel wrote:
    >> ... Michel Suignard himself (long-time Unicoder) already admitted that:
    >>
    >> # No languages used in the former soviet union should require a mix of
    >> # latin and cyrillic in a single dns label.
    >> # Unicode contains many latin homographs in the Cyrillic block exactly
    >> for
    >> # that reason, to avoid mixing the two scripts in a single word. ...
    >>
    > Michel may have "admitted" this

    I feel I have to apologize. Again. Not so much to Michel, but to the
    other senior Unicoders, for my comment about backpedaling. They didn't
    say what Michel did. Michel did. The others are not to blame.

    > but it is nevertheless untrue in at least two ways:
    >
    > 1) Kurdish (Cyrillic), Udi and Wakhi are languages of the former Soviet
    > Union which require a mix of Latin and Cyrillic within their ordinary
    > orthography;

    Yes, I read the first half or two thirds of the following:

    http://ptolemy.tlg.uci.edu/~opoudjis/unicode/unicode_mixing.html

    In particular, the discussion of "to conflate, or to disunify?" was
    fascinating stuff. A very good read indeed.

    > 2) A Russian, Alexander, who presumably knows the situation with his own
    > language better than Michel does, has given a real example of how Latin
    > and Cyrillic need to be mixed in Russian IDNs. And who is Michel to tell
    > the Russians how to write their own language, or that they are not
    > allowed to use international acronyms like XML within their IDNs?

    Good point. I cannot speak for Michel here, but I strongly doubt that he
    is trying to tell the Russians what to do.

    > All that this shows is that there is no easy answer to the spoofing
    > problem. At least, a simplistic ban on mixed scripts doesn't work. A
    > confusables mapping might provide a solution, but I have seen no good
    > suggestions on how this might be presented to an end user.

    I have high hopes for Neil Harris' algorithm, involving looking for
    strings that consist entirely of homographs, within a context where
    those would not be expected. The feedback to the user could be to simply
    leave those domain names in Punycode form. Hopefully, the user will look
    at the domain name before typing in a credit card number.

    Erik



    This archive was generated by hypermail 2.1.5 : Sat Feb 19 2005 - 19:39:19 CST