From: Christopher John Fynn (cfynn@gmx.net)
Date: Fri Jun 27 2003 - 16:37:32 EDT
Rick McGowan <rick@unicode.org> has privately suggested moving
the discussion of Combining Classes of *Tibetan* Characters
from the main Unicode list unicode@unicode.org to the TIBEX list
tibex@unicode.org - an "experts" list which was set up several
years ago specifically to discuss proposals for encoding Tibetan
characters in Unicode. If there are people who have a
particular interest in Tibetan characters and have been
following the thread here who would like to continue following
this thread - perhaps they could ask Rick how they can join that
list.
I'll follow Rick's advice - perhaps this discussion is more
appropriate on the TIBEX list - even though similar issues with
some Hebrew characters which have been raised here (again) as a
result of this thread makes me think there may be a need for a
non script specific solution or work-around to problems with
cannoical combining class values.
Anyway I'm going to move this discussion over there with a
parting shot...
Off-list Robert Chilton has pointed out to me the following:
> 3. A very common occasion of 0F7E occurring with a vowel is in
the stack
> HaUm (orthographic sequence of 0F67 0F71 0F74 0F7E). Because
0F7E is
> currently assigned a cc of zero, this *same glyph-form* could
> theoretically be encoded with a total of 6 different character
> sequences, resulting in 4(!) different sequences following
> normalization. Properly, all 6 sequences should normalize to
the same
> sequence -- which is indeed the case if 0F82 or 0F83 is used
in place of
> 0F7E. Obviously a major problem, not only for rendering but
also for
> searching and sorting.
FOUR different sequences possible *after* "normalisation" ???
Personally I would have rather seen all Tibetan characters
having a CCV of 0 (and all pre-combined Tibetan characters
*strongly* depreciated)rather than this. If someone simply
follows the normal rules for writing Tibetan, then characters
will be entered in a very predictable order which is far easier
to process than the one(s) they can end up in after Unicode
"normalisation".
- Chris Fynn
BTW My apologies to anyone who receives two copies of this
message.
This archive was generated by hypermail 2.1.5 : Fri Jun 27 2003 - 17:14:43 EDT