From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Tue May 29 2007 - 09:16:27 CDT
On Tue, 29 May 2007, "Arne Gtje ()" wrote:
> what's the Unicode policy for the Combining Overstruck diacritics,
> especially U+0335 and U+0336?
I don't remember having seen any policy statements on such issues, beyond
the properties defined for the characters. Unicode primarily encodes
characters and defines properties for them, instead of telling you which
character to use in which situation.
> Is it appropriate to use
> <i><U+0336>
> <I><U+0336>
> <l><U+0336>
> <L><U+0336>
> <u><U+0336>
> <U><U+0336>
> in an alphabet
If you are designing a new alphabet, it is up to you to choose the
characters. Different choices have different implications. In particular,
dynamic composition is still problematic (if supported at all) in many
programs.
> or should the precomposed ones (U+0268, U+0197, U+019A,
> U+023D, U+0289, U+0244) be used instead?
They are _not_ precomposed characters, and there is no defined
relationship (within Unicode) between them and the sequences you
mentioned. They may look similar, but thery are quite distinct.
They should not be expected to look the same; rather the opposite,
Unicode does not analyze and decompose letters with a stroke as containing
a diacritic mark. Instead, they are coded as separate characters. (I've
never seen an explanation to this, but it's certainly too late to change
such issues, and the decision is understandable if you consider how the
"stroke" in letters varies in shape.)
> Same applies to the LINE BELOW (U+0331 or U+0332?)
No, that's a different issue, because there are precomposed character with
those characters as components.
> Should <d><D><l><L><r><R><t><T> with line below used as combined
> diacritics, or as precomposed codepoints?
It depends. You need to consider the different factors. Unicode just tells
that there is canonical equivalence and there are various
normalization forms. On the practical side, depending on implementations
and not on the Unicode standard, the precomposed form (when available)
in better supported by software and results in better rendering. But there
are many factors that might make decomposed form more feasible.
> I'm asking, because I need to use <d><D><t><T> with <U+0301> anyways to
> get the desired glyph...
I guess you are referring to the practical point I mentioned. Using a
precomposed character, you can get a a glyph designed by a font designer;
using a combining diacritic mark, you often get an oddly placed mark.
Theoretically, the rendering engine could map a sequence to the same glyph
as the one used for a precomposed character, but this is not common.
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Tue May 29 2007 - 09:20:18 CDT