From: Francois Yergeau (FYergeau@alis.com)
Date: Mon Jan 12 2004 - 13:38:18 EST
Markus Scherer wrote:
> Clark Cox wrote:
> > According to the comment at the beginning of the file, and
> all that I've
> > read elsewhere, toNFC(U+1025 U+102E) should result in
> U+1026. However
> > both U+1025 and U+102E have combining classes of zero, so
> my code does
> > not compose those characters. No information that I've been
> able to find
> > has been able to explain this discrepancy. Any help would
> be greatly
> > appreciated.
>
> There is no discrepancy. The starter must have ccc==0 but the
> second character's ccc can be anything. See Hangul.
This little-known fact (along with the better-known fact that not all
non-zero-ccc-characters do take part in existing precomposed characters) has
prompted the W3C's Character Model spec to define "composing characters", a
concept somewhat distinct from Unicode's combining characters. Appendix C
at
http://www.w3.org/International/Group/charmod-edit/Overview.html#sec-Composi
ngChars contains the definition as well as a list of the characters with
ccc=0 that do take part in existing compositions; U+102E is there, of
course, as well as the above-mentionned Hangul plus some others.
-- François
This archive was generated by hypermail 2.1.5 : Mon Jan 12 2004 - 14:14:17 EST