From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Apr 03 2005 - 15:05:00 CST
From: "Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl>
>> Do the recent additions to Unicode 4.1 make any changes to NFC? i.e.
>> does a program that correctly performs normalization on Unicode 4.0
>> data need any updates, to data tables or algorithms, to normalize
>> Unicode 4.1 data in normalization form C?
>
> Yes. New CJK compatibility ideographs U+FA70..U+FAD9 have canonical
> decompositions into single characters. For example NFC(U+FACF) =
> U+2284A (for the first time a BMP character is normalized to something
> outside BMP).
Isn't that against Unicode statibility? Shouldn't it have been the reverse,
keeping U+FACF stable and normalizing U+2284A to U+FACF to keep the
compatibility? If this was added because of a past error, then this MUST be
urgently documented.
I had really thought the NFC and NFD normalization were intended to be FULLY
stable (in absence of an obvious error corrected in a corrigendum, but not
in a release) within the set of codepoints that have receivend standard
assignments.
If things change within the set of newly assigned codepoints, this is not an
issue, as existing documents normalized in the past should not have used
them (and if they did, they were already non-conforming...)
> These are the only differences in NFC/NFD between Unicode 4.0.1 and 4.1.0.
>
> There are 48 more differences in NFKC/NFKD.
These are less serious. If a new 4.1 character has now decompositions to
characters in Unicode 4.0, they respect the principles.
I will seriously download the new UCD database, when I've got some time. If
what you say is true, then there's a real problem in the way Unicode now
considers its "stability pact", if Unicode can change its opinion for such
characters, but also refuses to change anything in the normalization of
other scripts like Hebrew which are deserved by its sub-optimal combining
classes...
So please, at the UTC, demonstrate that those changes were absolutely
needed, because the previous normalizations were obviously wrong.
This archive was generated by hypermail 2.1.5 : Sun Apr 03 2005 - 15:05:55 CST