From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue May 09 2006 - 09:48:36 CDT
To make things even clearer, the normalization stability should really state that the stability is NOT guaranteed for sequences of characters which contain any unallocated codepoints.
This would go in the first modified paragraph related to Unicode stability.
This would really warn all users of normalization algorithms to not include any unallocated codepoint in their normalized texts. This would also mean that a process that NEEDS this stability should reject all texts containing unallocated codepoints that they don't have in their current version of the Unicode database.
This warning should be included in the modified paragraph that speak about "forbidden" characters.
A conforming application should then be free to reject texts containing codepoints that they still don't support in their builtin version of the UCD. If an application tolerates those texts, then they should not assume the stability of normalized forms, and so should better not apply any normalization, to keep the texts intact (this is a conforming behavior, as normalization of texts is not mandatory in conforming applications).
This impacts other Unicode algorithms, such as collation (the sort order of texts containing unallocated codepoints is NOT defined and NOT stable as long as those codepoints are not officially standardized), case mappings, and so on (because theconforming implementation of those Unicode algorithms MUST return canonically equivalent strings for canonically equivalent input texts, with some implementations performing a normalization of their output to implement this requirement, although the normalization of output is not required)...
And of course this includes IDN which depends on Unicode normalization stability (so IDN-capable registries should really not accept any codepoint in any domain name parts as long as such codepoints has not been officially allocated). This verification step is not required within IDN clients (because the IN-capable registry or DNS server will return "domain name not found" errors if a client sends names containing unallocated codepoints.
Philippe.
----- Original Message -----
From: "Mark Davis" <mark.davis@icu-project.org>
To: <unicode@unicode.org>
Sent: Tuesday, May 09, 2006 12:10 AM
Subject: PRI#86 Update
> Re: http://www.unicode.org/review/#pri86
>
> There is additional informative text in UAX #15: Normalization Forms
> based on text contributed by Ken Whistler, at
>
> http://www.unicode.org/reports/tr15/tr15-26.html#Forbidding_Characters.
> Feedback is welcome.
This archive was generated by hypermail 2.1.5 : Tue May 09 2006 - 09:54:52 CDT