Florian Weimer <fw@deneb.enyo.de> wrote:
> > It will always be necessary for people to think a bit when creating
> > their email addresses,...
>
> Well, you can't expected people to know most of Unicode just to choose
> an email address. :-/
and then later:
> > In general, the problem is unsolvable. There are several look-alikes
> > among the Cyrillic, Greek, Latin and Cherokee blocks, among others.
>
> And those are not equivalent under normalization? That's a pity.
As others have explained, Unicode does not specify (nor should it) any type
of "normalization" mechanism to equate similar-looking glyphs that belong to
different scripts.
One of the primary purposes of Unicode is to support many scripts with the
same character set, instead of requiring different 8-bit code pages for
Western European Latin, Eastern European Latin, Latin + Greek, Latin +
Cyrillic, etc. As a result, if this were a Unicode document (encoded in
UTF-8 or by other means), it could contain the glyph H by itself and you
might not have any visual way to tell whether it was:
- U+0048 LATIN CAPITAL LETTER H
- U+0397 GREEK CAPITAL LETTER ETA
- U+041D CYRILLIC CAPITAL LETTER EN
or even
- U+13BB CHEROKEE LETTER MI
There is nothing wrong with this, because as humans we normally have no need
to identify the script of a single isolated glyph, or else we have some
context to help us make that determination (such as, H comes after G).
I don't know what would be the intent of a person who deliberately inserts a
similar-looking Greek or Cyrillic letter in the middle of some Latin text. I
do have at least one KOI8-R document which has Latin A and O in place of the
proper Cyrillic versions, but that just shows that this is a multi-script
issue that has little or nothing to do with Unicode.
-Doug Ewell
Fullerton, California
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT