Florian Weimer <fw@deneb.enyo.de> wrote:
>  > It will always be necessary for people to think a bit when creating
>  > their email addresses,...
>
>  Well, you can't expected people to know most of Unicode just to choose
>  an email address. :-/
and then later:
>  > In general, the problem is unsolvable. There are several look-alikes
>  > among the Cyrillic, Greek, Latin and Cherokee blocks, among others. 
>  
>  And those are not equivalent under normalization?  That's a pity.
As others have explained, Unicode does not specify (nor should it) any type 
of "normalization" mechanism to equate similar-looking glyphs that belong to 
different scripts.
One of the primary purposes of Unicode is to support many scripts with the 
same character set, instead of requiring different 8-bit code pages for 
Western European Latin, Eastern European Latin, Latin + Greek, Latin + 
Cyrillic, etc.  As a result, if this were a Unicode document (encoded in 
UTF-8 or by other means), it could contain the glyph H by itself and you 
might not have any visual way to tell whether it was:
  -  U+0048 LATIN CAPITAL LETTER H
  -  U+0397 GREEK CAPITAL LETTER ETA
  -  U+041D CYRILLIC CAPITAL LETTER EN
or even
  -  U+13BB CHEROKEE LETTER MI
There is nothing wrong with this, because as humans we normally have no need 
to identify the script of a single isolated glyph, or else we have some 
context to help us make that determination (such as, H comes after G).
I don't know what would be the intent of a person who deliberately inserts a 
similar-looking Greek or Cyrillic letter in the middle of some Latin text.  I 
do have at least one KOI8-R document which has Latin A and O in place of the 
proper Cyrillic versions, but that just shows that this is a multi-script 
issue that has little or nothing to do with Unicode.
-Doug Ewell
 Fullerton, California
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT