From: Christopher John Fynn (cfynn@gmx.net)
Date: Sat Jun 21 2003 - 21:23:17 EDT
In Unicode's UnicodeData.txt (
http://www.unicode.org/Public/UNIDATA/Unicodea.Dattxt )
0F7E has a Canonical Combining Class Value (CCCV) of 0;
0F71 a CCCV of 129;
0F72 0F7A 0F7B 0F7C 0F7D and 0F80 a CCCV of 130;
0F74 a CCCV of 132;
and 0F82 and 0F83 have a CCCV of 230.
By normal Tibetan & Dzongkha spelling, writing, and input rules
Tibetan script stacks should be entered and written: 1 headline
consonant (0F40-0F6A), any subjoined consonant(s) (0F90-
0F9C), achung (0F71), shabkyu (0F74), any above headline
vowel(s) (0F72 0F7A 0F7B 0F7C 0F7D and 0F80) ; any ngaro (0F7E,
0F82 and 0F83)
So following normal Tibetan & Dzongkha input and spelling rules
the relative ordering of these characters should be:
A. 0F71
B. 0F74
C. 0F72 0F7A 0F7B 0F7C 0F7D and 0F80
D. 0F7E, 0F82 and 0F83
The fact that, in a process of "canonical decomposition" or
"normalisation", these combining characters can get reordered
in a bizarre order relative to each other causes difficulties
with culturally correct collation (where 0F7E, 0F82 and 0F83
should have an equal value) - and especially it necessitates
making lookups in smart fonts far more complex and inefficient
than they should have to be.
(In Tibetan script fonts 0F71 and 0F74 are often ligated with
preceding consonant (+ subjoined consonants) combined as a
single glyph whereas above headline vowels are almost always
treated as non spacing combining marks.)
Currently there seems to be no easy or standardized work around
for these problems and the standard seems to say that the
relative values of assigned Canonical Combining Class Values
cannot be changed.
Any suggestions as to how to create a standardized work around
for these incorrect values?
- Chris
This archive was generated by hypermail 2.1.5 : Sat Jun 21 2003 - 21:58:04 EDT