From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Jul 24 2010 - 11:35:29 CDT
"Clark S. Cox III" <clarkcox3@me.com>
> How can *any* combining character have a combining class of zero? Isn't that a contradiction in terms?
>
> The U+035D in your example, for instance, has a combining class of 234.
No contradiction. Not all combining characters have a non-zero
combining class. The combining class is not fully discriminant to
determine all combining characters.
Yes U+035D has a non-zero combining class, but not all characters with
combining class 0 are base characters. For examples there exists valid
sequences starting by a character with combining class 0 but that are
still defective because this character has a combining class 0.
The non-zero combining classes that are assigned to *some* (not all)
combining characters, are just a convenient tehcnical tool used to
allow compatibility with various « legacy » encodings and that require
the concept of canonical equivalences and of normalized forms.
If there had not existed such legacy encodings (that are officially
supported by Unicode and by ISO, with their standardized mappings to
the UCS), all the numeric combining classes, the concept of canonical
equivalences and the standard Unicode normalized forms C and D would
not even be needed at all and should probably have never existed,
because all characters including those combining with another base
characters, would be entered and encoded ONLY in their logical order
(as determined from the language-specific semantics).
Combining classes are even completely avoided (i.e. assigned with a 0
value) when new scripts get encoded directly in the UCS without any
previous encoding to support.
This archive was generated by hypermail 2.1.5 : Sat Jul 24 2010 - 11:38:34 CDT