L2/11-283 Date: Thu, 21 Jul 2011 11:24:00 -0400 From: Behdad Esfahbod Subject: Document more normalization invariants Please add to UTC agenda. I like to propose documenting the following normalization invariants, and guaranteeing them in the stability policy. 1) The stability policy already says: "Canonical mappings (Decomposition_Mapping property values) are always limited either to a single value or to a pair. The second character in the pair cannot itself have a canonical mapping." However, these two properties are not well-documented in UAX#15. I believe they are worth documenting there. 2) The full canonical decomposition of a character does not expand to more than four characters. This is currently the case, but there is no guarantee that it remains so. Given that encoding a violation of this rule needs encoding at least five characters, I'm fairly confident that such a mapping will not be encoded in future versions, but if that is the consensus at UTC, maybe it's worth documenting. Note that there is such a guarantee for NFC already (x3). What I'm suggesting is to document max expansion for NFD to be x4. 3) The full compatibility decomposition of a character does not expand to more than 18 characters. Like previous case, if the consensus at UTC is that such a decomposition is not acceptable for encoding anymore, can it be documented so? If these cannot be coded in stone, maybe then can be added to "Invariants in Implementations" section of TR44. Cheers, behdad