Re: Questions on Greek characters

From: Marco Cimarosti (
Date: Thu May 18 2000 - 13:26:35 EDT

Mark Leisher wrote:
> scríobh Nick NICHOLAS:
>> [...]
>> ( are already placing a
>> capital lunate sigma (and capital yod!) in U+03f4 and
>> U+03f5...
> [...]
> Everyone seems to forget that these *glyphs* in this font were
> assigned *glyph codes* of U+03f4 and U+03f5, not character
> codes. It really doesn't matter where the glyphs occur in
> the font, as long as the character-glyph mapping works, no?

Holy words. Amen.

As far as numbers 0 to 65535 remain copyleft, a Unicode code point is the most natural glyph code, and no one has rights or reasons to forbid this practice.

However, extreme care should be taken to not confuse this private convention with the Unicode standard itself. Two things could have been avoided in the present case:

1) Avoid using the "U+" prefix when talking about glyph codes (hmmm... What about "G+"?)

2) Avoid using unassigned code points. If Unicode assigns them later, the "natural" glyph code for the new Unicode character will result occupied, and a less mnemonic mapping will become necessary.

In an ideal world, all extra glyphs needed by the renderer should be assigned codes that are not legal Unicode code points.

This is no problem for those who can afford 32-bit glyph codes: the whole range 0x00110000 .. 0xFFFFFFFF is out of range and can safely be used for this purpose.

For rendering system limited to 16-bit glyph codes it is wiser, IMHO, to use *assigned* code points that -- because of some specific design constraint -- will never need to be rendered.

A few examples:

- A system assuming Normalization Form D (or KD) as the first step of rendering can recycle all code points which have a canonical (or compatibility) decomposition.

- A system that does not allow user to define their own private characters can overwrite the PUA.

- A system that is restricted by design to some scripts only, can steal all the code points of unimplemented scripts.

- A system that decides not to implement surrogate pairs can overwrite the hi and lo surrogates areas.

Notice that there is no risk of corrupting the underlying Unicode text: a decent rendering engine should do all its mappings and reorderings on an internal *copy* of the text.

_ Marco
FREE Personalized Email at
Sign up at

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT