[Unicode] Unicode Corrigenda Tech Site | Site Map | Search

Corrigendum #5: Normalization Idempotency


Corrigendum Effective Date Applicable Versions Fixed Version Result Documented In:
Corrigendum #5: Normalization Idempotency 2005-Feb-07
[102-C3, PRI #61, PRI #29]
3.0.0 to 4.0.1 4.1.0
UAX #15


The language of the of the specification of UAX #15: Unicode Normalization Forms (citing Version 4.0) for forms NFC and NFKC is not logically self-consistent in The Unicode Standard, Versions 3.0 through 4.0.1. Programs that depend on such logical consistency could be subject to security problems until fixed, although as yet no realistic scenarios are known that would present such problems. The problem text occurs in Definition D2, which defines what it means for a character to be blocked. This corrigendum provides a textual fix for this problem.

The change will not have an impact on real data found in practice (with the possible exception of test cases for the algorithm itself), because the affected sequences do not constitute well-formed text in any known language.

For more background information, see Public Review Issue #29, Normalization Issue.

Changes to the Text of UAX #15

Whenever this corrigendum is applied to a version of Unicode from Unicode 3.0.0 to Unicode 4.0.1, the text for definition D2 in UAX #15 is changed by adding two words (underlined here), so that it has the following wording:

D2. In any character sequence beginning with a starter S, a character C is blocked from S if and only if there is some character B between S and C, and either B is a starter or it has the same or higher combining class as C.

Explanatory text on the implications of this corrigendum for implementations can be found in UAX #15: Unicode Normalization Forms in Section 3.3, Guaranteeing Process Stability and Section 20, Corrigendum 5 Sequences.

Access to Copyright and terms of use