Re: AbstractCharacter class

From: KNAPPEN@MZDMZA.ZDV.UNI-MAINZ.DE
Date: Tue Jul 29 1997 - 04:58:34 EDT


John Cowan schrieb:

> Thus LATIN CAPITAL LETTER O plus COMBINING DOT BELOW plus
> COMBINING CIRCUMFLEX BELOW plus COMBINING CIRCUMFLEX (to make
> up an example) can be reduced to LATIN CAPITAL LETTER O WITH
> CIRCUMFLEX AND DOT BELOW (U+1ED8) plus COMBINING CIRCUMFLEX BELOW,
> but if DOT BELOW comes after CIRCUMFLEX BELOW, the shortest reduction
> is to LATIN CAPITAL LETTER O WITH CIRCUMFLEX plus COMBINING DOT
> BELOW plus COMBINING CIRCUMFLEX BELOW.

Hmm... I think first of all, a canonical order of all combining marks is
needed. The combining marks fall into three classes: strike-through, below,
and above (maybe those wide combining marks form a forth class). Note that
you cannot reorder the combining marks within one class without changing
the character:

> but if DOT BELOW comes after CIRCUMFLEX BELOW

In this case, the dot is displayed under the circumflex, where in the
original case it was above.

I am not sure which order of the classes is the best (strike-through >
below > above or strike-through > above > below). A good analysis would
take the language of the text data into account (and anylyse e. g. for
vietnamese A WITH CIRCUMFLEX as a base letter), but this is impossible for
a canonical algorithm which must work on untagged data.

--J"org Knappen



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT