From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jan 25 2007 - 18:56:33 CST
Ruszlán asked:
> Kenneth Whistler wrote:
>
> > 1. U+200D is not part of any canonical decomposition mapping,
> > and there is 0% chance that the UTC would ever add ZWJ or
> > ZWNJ to such mappings.
>
> And why not?
Because, as Richard Wordingham pointed out:
"In general, the effects of ZWJ and ZWNJ are optional."
Canonical decomposition mappings are normative, required
relations that define identity between characters and other
sequences of characters.
ZWJ and ZWNJ are, for the most part, hints regarding presentation
and rendering, and have no impact on the interpretation of
the identity of sequences.
You are trying to use them to "enforce" shaping *and* to
create canonical equivalences, where the UTC did not intend
such effects and where implementers (for many years now) have
not done so.
As I said, there is 0% chance that the UTC is going to revisit
such decisions and try to repurpose these characters to
participate in canonical decomposition mappings.
> > 2. U+00C6 has no decomposition mapping now, and by normalization
> > stability guarantees, none can be added.
> >
> > Please study:
> >
> > http://www.unicode.org/standard/stability_policy.html
> >
> > and in particular, item 3a under Decomposition Mapping in
> > the Normalization Stability Policy.
>
> Hmmm... ok, though I cannot quite see the rationale behind such
> restrictive policy.
> Why can't decomposition mappings be version-specific?
Because then the status of strings as normalized or not
would also be version-specific. A string stored by a Unicode 4.0
application might turn out not to be normalized when read
by a Unicode 5.0 application. That would be a completely
unacceptable outcome for nearly all of the implementers
out there -- and in particular for databases and all
internet applications.
--Ken
This archive was generated by hypermail 2.1.5 : Thu Jan 25 2007 - 18:57:44 CST