There is already a canonical order of combining marks. It is described
in sections 3.9 and 4.2 of the Unicode Standard, Version 2. Ordering
information is available on http://www.unicode.org.
Mark
KNAPPEN@MZDMZA.ZDV.UNI-MAINZ.DE wrote:
> John Cowan schrieb:
>
> > Thus LATIN CAPITAL LETTER O plus COMBINING DOT BELOW plus
> > COMBINING CIRCUMFLEX BELOW plus COMBINING CIRCUMFLEX (to make
> > up an example) can be reduced to LATIN CAPITAL LETTER O WITH
> > CIRCUMFLEX AND DOT BELOW (U+1ED8) plus COMBINING CIRCUMFLEX BELOW,
> > but if DOT BELOW comes after CIRCUMFLEX BELOW, the shortest
> reduction
> > is to LATIN CAPITAL LETTER O WITH CIRCUMFLEX plus COMBINING DOT
> > BELOW plus COMBINING CIRCUMFLEX BELOW.
>
> Hmm... I think first of all, a canonical order of all combining marks
> is
> needed. The combining marks fall into three classes: strike-through,
> below,
> and above (maybe those wide combining marks form a forth class). Note
> that
> you cannot reorder the combining marks within one class without
> changing
> the character:
>
> > but if DOT BELOW comes after CIRCUMFLEX BELOW
>
> In this case, the dot is displayed under the circumflex, where in the
> original case it was above.
>
> I am not sure which order of the classes is the best (strike-through >
>
> below > above or strike-through > above > below). A good analysis
> would
> take the language of the text data into account (and anylyse e. g. for
>
> vietnamese A WITH CIRCUMFLEX as a base letter), but this is impossible
> for
> a canonical algorithm which must work on untagged data.
>
> --J"org Knappen
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT