From: Peter Kirk (peterkirk@qaya.org)
Date: Sun Apr 03 2005 - 16:36:15 CST
On 03/04/2005 22:28, Doug Ewell wrote:
>Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>
>
>
>>>Yes. New CJK compatibility ideographs U+FA70..U+FAD9 have canonical
>>>decompositions into single characters. For example NFC(U+FACF) =
>>>U+2284A (for the first time a BMP character is normalized to
>>>something outside BMP).
>>>
>>>
>>Isn't that against Unicode statibility? Shouldn't it have been the
>>reverse, keeping U+FACF stable and normalizing U+2284A to U+FACF to
>>keep the compatibility? If this was added because of a past error,
>>then this MUST be urgently documented.
>>
>>
>
>They're new characters, Philippe. They weren't encoded until 4.1.
>
>
>
In that case these character allocations seem perverse, given that both
of these characters could have been assigned to the BMP, or both to
outside it - or the reverse normalisation as suggested by Philippe.
There is a serious danger of breaking existing implementations
(especially those which only fully support the BMP) by introducing a BMP
character which normalises to outside the BMP. For the BMP is now no
longer a closed subset of Unicode, under operations like normalisation
which existing implementations expected to find closed. Maybe someone
thought this was a good idea, to force implementations to be upgraded,
but it strikes me as a recipe for disaster. It could also be a serious
security hole, as hackers try sending U+FACF to various implementations
in an attempt to crash them.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.9.1 - Release Date: 01/04/2005
This archive was generated by hypermail 2.1.5 : Sun Apr 03 2005 - 16:36:46 CST