The canonical order carries absolutely no semantic meaning except to
separate classes of combining characters within which the order is
significant. When canonica combining classes are different, you cannot
imply any logical order between these diacritics, and you need a collation
tailoring to determine the logical order, or need to insert additional
controls between them to encode their order when both order are possible
ans semantically different.
This is not exceptional, and the Hebrew script for example has such complex
casesfor some diacritics that were historically given a non-zero combining
class, correctly distinct from othernon-zero cobmining classes (but these
diacritics should have probably used a zero combining class). You cannot
solve it using only canonical order whose only intent is to convey the
canonical equivalences.
The relative numeric value of distinct non-zero combining classes means
nothing linguistically. All that matters is that they are non-zero and
distinct or not. In other words, the combining classes have NO order,
except for normalization.
2013/9/6 Markus Scherer <markus.icu_at_gmail.com>
> Unicode 6.2 chapter 11<http://www.unicode.org/versions/Unicode6.2.0/ch11.pdf>.3
> Myanmar, Table 11-3. Myanmar Syllabic Structure, shows that 103A asat sign
> comes before 1037 dot below. However, 1037 has ccc=7 which comes before (in
> canonical order) 103A which has ccc=9.
>
> Is it correct that Unicode normalization of Myanmar text moves characters
> out of the order in table 11-3?
> If so, should there be a note about this in the text? (Sorry if I just
> missed it.)
>
> markus
>
Received on Fri Sep 06 2013 - 13:21:27 CDT
This archive was generated by hypermail 2.2.0 : Fri Sep 06 2013 - 13:21:31 CDT