Re: Discrepancy between Names List & Code Charts?

From: John Hudson (tiro@tiro.com)
Date: Thu Aug 15 2002 - 02:22:28 EDT


About the design and encoding of diacritics involving cedillas and
commaaccents:

[Note that remarks about language use are limited to a European context.]

These glyphs are sometimes called /*cedilla/, but this is due to an
historical misinterpretation in both the Unicode standard and the original
version of the Adobe Glyph List:

/Gcedilla/gcedilla/
/Kcedilla/kcedilla/
/Lcedilla/lcedilla/
/Ncedilla/ncedilla/
/Rcedilla/rcedilla/

These glyphs are used in a European context only for Latvian, and the
correct form of diacritic is *not* a cedilla but the same unattached
'commaaccent' form used for Romanian S and T. [Note, however that you
should not use the /comma/ glyph as a component below any of these letters:
it is much too large. You want a shorter, typically curved form, occupying
about the same height as the cedilla. The mark should be centred optically
below the letter.]

So these glyphs should actually be

/Gcommaaccent/gcommaaccent/
/Kcommaaccent/kcommaaccent/
/Lcommaaccent/lcommaaccent/
/Ncommaaccent/ncommaaccent/
/Rcommaaccent/rcommaaccent/

but mapped to the ...WITH CEDILLA Unicode characters.

NB: the lowercase /gcommaaccent/ is almost always written with a variant
mark that actually sits above the letter (to avoid collision with the
descending loop); this is achieved by rotating the commaaccent mark 180
degrees and positioning it above the g. I usually include the variant
ingredient glyph /uni0312/ to use in the /gcommaaccent/ composite.

Regarding the /Scedilla/ and /Tcedilla/ vs. /Scommaaccent/ and /Tcommaaccent/:

/Scedilla/scedilla/ are used only for Turkish; this must be a true cedilla.

/Scommaaccent/scommaaccent/ and /Tcommaaccent/tcommaaccent/ are used only
for Romanian; this must be the same 'comma' diacritic form discussed above
for Latvian, and should *not* be attached to the letter.

/Tcedilla/tcedilla/ is not used for any European language (it is arguably
more appropriate for Gagauz Turkish than the 'comma' accent form, because
they also use the /Scedilla/, but GT texts I have seen all use the 'comma'
below the T and the cedilla below the S). Generally I do not include the
cedilla variant in fonts, and simply double map the /Tcommaaccent/ to the
Unicode values discussed below.

Version 3.0 of the Unicode standard, which postdates the published WGL4
set, disunified the /Scedilla/ and /Tcedilla/ from the /Scommaaccent/ and
/Tcommaaccent/ by providing new codepoints for the latter. My
recommendation is to use the new codepoints for /Scommaaccent/ but to
double map the /Tcommaccent/ glyph to the new codepoints and also to the
old /Tcedilla/ codepoint.

Note that there are text encoding issues regarding Romanian, because the
Romanian 8-bit codepages all use the old /Scedilla/ and /Tcedilla/ Unicode
codepoints, not the new codepoints for the 'comma' accent characters. In
OpenType fonts, we've addressed this (for future support) by including a
Language System tag for Romanian, and a Localised Forms <locl> feature
lookup to substitute the /Scommaccent/ glyph for the /Scedilla/. This
feature is not yet supported in any systems or applications, but I'm
reasonably certain that it will be.

John Hudson

Tiro Typeworks www.tiro.com
Vancouver, BC tiro@tiro.com

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it. - Terry Eagleton



This archive was generated by hypermail 2.1.2 : Thu Aug 15 2002 - 00:26:14 EDT