From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Feb 10 2003 - 22:40:12 EST
António MARTINS-Tuválkin (with no diaeresis !) asked:
> Anyway, I noted once more that many cyrillic letters I'd consider as
> "base letter + diacritical" composites are not decomposable according to
> Unicode. I planned to dwell deeper into this, but is there a short
> answer for it?
The short answer is that the extended Cyrillic characters
in question use diacritics that are mostly various distortions
of the base letterforms (the descender ticks and the various
hook forms) or involve bars across letter strokes. Long ago
it was decided that it would not be a good idea to extend
formal character decomposition to such base letterform shape
changes or bars across letters. (Note that Latin characters
with bars: barred-b, barred-d, barred-i, barred-u, barred-l,
and the like are also not decomposed formally. Similarly for
Latin letters with hooks, and so on.)
So formal canonical decompositions are almost entirely
confined to separable, accent-like diacritics (acute,
grave, diaeresis, and so on). The only significant exceptions are
the cedilla and ogonek, which attach smoothly to letter
bottoms without otherwise distorting them, and which
often have graphic alternates that are, indeed, separated
diacritics (comma-like and reverse-comma-like forms).
--Ken
This archive was generated by hypermail 2.1.5 : Mon Feb 10 2003 - 23:24:49 EST