RE: Identifiers

From: Yves Arrouye (yves@realnames.com)
Date: Mon Apr 16 2001 - 11:09:58 EDT


> > On Sun, Apr 15, 2001 at 08:10:55PM +0200, Florian Weimer wrote:
> > > Is it sufficient to mandate that all such identifiers
> MUST be KC- or
> > > KD-normalized? Does this guarantee print-and-enter round-trip
> > > compatibility?
> >
> > In general, the problem is unsolvable. There are several look-alikes
> > among the Cyrillic, Greek, Latin and Cherokee blocks, among others.
>
> And those are not equivalent under normalization? That's a pity.

But that is not the goal of the Unicode normalization! (Reas UAX #15,
http://www.unicode.org/unicode/reports/tr15/). Which is to be expected, from
a standard about characters, anf not glyphs.

The normalization you are talking about seems to me to be one that is
glyph-centric: you're looking at shapes and are wanting to avoid confusions
by making similar-looking things the same. We have normalization similar to
the one you're talking about in our Internet Keywords system. It is built on
top of NFKC. It is good for users, but then it is also very specific. For
example, we didn't consider the look-alikes aming Cyrillic, Greek, and Latin
to be a problem for our users, but your comment about that being a pity
seems to imply that you would. I think such normalizations depend a lot
about who is going to need the names and in what context. It'll be very hard
to make a general recommendation that isn't too restrictive for many.

YA
  



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT