From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Fri Feb 11 2005 - 03:56:25 CST
I did not read the bugzilla thread.
On Friday, February 11th, 2005 04:48Z Murray Sargent va escriure:
> It's the alphabetic characters of
> Latin, Greek and Cyrillic that shouldn't be mixed, or the user may
> suffer consequences no user should have to endure.
I think I remember seeing a Cyrillic Q been registered, or in the tracks to
be registered (sorry, to lazy to just check the code; while I am sure
someone will answer this post and give it). This means surely that _this_ Q
letter will not be a problem, one should _not_ have to use a Latin Q inside
Cyrillic letters just to have his name written correctly (which is at the
end the very point of IDN).
However, it also means that linguists for the lesser used languages do NOT
stop at script frontiers, they globalize, they DO mix characters from
differing "alphabets" in order to acomodate the unexpected uses. Saying it
is Unicode that should register the "new" use of the character _before_ the
name could be registered is just going to make people unhappy against
lengthly procedures, and also makes the pressure on Unicode and WG2 a bit
higher, unnecessarily.
Also, determining the frontier is not an easy job in general. Of course it
should fairly obvious for Latin Cyrillic and Greek, but when you consider
Japanese which mix three scripts, things are a bit different; and when one
comes to the Indian scripts, where Devanagari signs are re-used with the
other scritps for Sanskrit... Also consider how to deal with Coptic vs.
Greek. All these strange cases will have to be dealt with in software; so
first it will take various years for all the IDN libraries to have it right
(with the piles of upset users, bug reports and upset maintainers), but in
the meantime it would make a perfect terrain for hackers, in much the same
way we had problems a few years ago with malformed UTF-8 strings.
While the Greek equivalent (ραγραΙ) does not look like anywhere as
attractive as Addison's example, I notice that the narrow characters does
not seem outlawed (I hope I missed something here ;-)), so we also have
paypal (looks funny but correct) :(. Neither does I see restriction about
use of payp@l (will certainly have its share of success in countries where
there have been a lot of hype with Internet, such as here in Spain).
Similarly paypaɭ, or even just paypaŀ or paypał or payp⒜l.
And of course there are example just without problem once people gets the
correct fonts, like ᏢᎪᎩᏢᎪᏞ (Cherokee).
As said Allison, it is just a game of cats and mice, disallowing mixed
scripts is (would have been, really) NOT the definitive solution, it will
just require the evils to be a little bit more clever. As the first example
shows clearly, since there could be money behind, we can assume they _will_
be clever.
OTOH, highlighting punycode in the address bar appears a good idea to me.
Also, one might have a look back at RFC 3454 (StringPrep) while discussing
this issue. This request says among others:
: 9.1 Stringprep-specific security considerations
:
: The Unicode and ISO/IEC 10646 repertoires have many characters that
: look similar. In many cases, users of security protocols might do
: visual matching, such as when comparing the names of trusted third
: parties. Because it is impossible to map similar-looking characters
: without a great deal of context such as knowing the fonts used,
: stringprep does nothing to map similar-looking characters together
: nor to prohibit some characters because they look like others. User
: applications can help disambiguate some similar-looking characters by
: showing the user when a string changes between scripts.
[more interesting text follows].
Another version of this particular piece is also in RFC 3491, NamePrep, a
more direct reference for an implementor.
Antoine
This archive was generated by hypermail 2.1.5 : Fri Feb 11 2005 - 03:57:26 CST