Re: U+015F (ÅŸ) vs. U+0219 (È™)

From: John Hudson (tiro@tiro.com)
Date: Fri Feb 22 2002 - 17:34:46 EST


At 10:30 2/22/2002, Eric Muller wrote:

>Those characters have the expected canonical mappings, with combining
>cedilla and combining comma below respectively, so they are  entirely
>distinct characters as far as Unicode is concerned. However, the last
>annotation on U+015F suggests they are the same. What is the truth?
>
>Is a glyph with a comma below a correct representation of U+015F, as the
>annotation suggests? Of course, such a font would not be usable for
>languages other than Romanian
>Should the annotations be interpreted (and may be changed) to something
>like: "U+015F is not used in Romanian, you are probably looking for
>U+0219; however, data encoded prior to Unicode 3.0 may have incorrectly
>used U+015F instead of U+0073 U+0326"?

U+015F is the codepoint included for Romanian support in Windows codepage
1250 and in the Mac Romanian codepage, so there is probably a *lot* of
Romanian data using this instead of the more correct Unicode assignment
U+0219. This situation appears to have occured due to the early unification
of these characters, which were later disunified.

We associate the /scommaccent/ glyph to the /scedilla/ character for the
Romanian language system in our OT fonts, using either the <locl> Localised
Forms or <reqd>/<dflt> processing feature, depending on our client's
preference. Neither solution is adequately supported yet.

John Hudson

Tiro Typeworks www.tiro.com
Vancouver, BC tiro@tiro.com

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
                                               Walter Benjamin



This archive was generated by hypermail 2.1.2 : Fri Feb 22 2002 - 17:02:24 EST