From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Thu May 24 2007 - 08:55:30 CDT
Hello Agnieszka Kasprzyk,
you were asking:
> how to deal with those characters from
> transliteration standards that do not exist as precomposed characters in
> Unicode but they are combined of others BUT they may be combined in a
> number of different ways. Which is the correct way?
The ways you have outlined are "canonically equivalent", so your
software should treat all of them as equivalent.
> Other cases are for instance letters with two diacritics one over the
> other.
...
> What is the rule to follow in such cases?
Multiple diacritics are placed outward from the base character,
so you have to enter
- either the base letter, the diacritic which is closer, the diacritic
which is farther from the base letter, in that order,
- or the base letter with the closer diacritic as one character,
then the farther diacritic as the combining character.
> Is there any document specifying what to do?
1st reading: <http://www.unicode.org/faq/char_combmark.html>
in depth: Sections 3.2, 3.7, and 3.11 of the Unicode Standard
<http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf>
(version 5.0 will be online, really soon now)
Good luck,
Otto Stolz
This archive was generated by hypermail 2.1.5 : Thu May 24 2007 - 08:57:57 CDT