About the design and encoding of diacritics involving cedillas and
commaaccents:
[Note that remarks about language use are limited to a European context.]
These glyphs are sometimes called /*cedilla/, but this is due to an
historical misinterpretation in both the Unicode standard and the original
version of the Adobe Glyph List:
/Gcedilla/gcedilla/
/Kcedilla/kcedilla/
/Lcedilla/lcedilla/
/Ncedilla/ncedilla/
/Rcedilla/rcedilla/
These glyphs are used in a European context only for Latvian, and the
correct form of diacritic is *not* a cedilla but the same unattached
'commaaccent' form used for Romanian S and T. [Note, however that you
should not use the /comma/ glyph as a component below any of these letters:
it is much too large. You want a shorter, typically curved form, occupying
about the same height as the cedilla. The mark should be centred optically
below the letter.]
So these glyphs should actually be
/Gcommaaccent/gcommaaccent/
/Kcommaaccent/kcommaaccent/
/Lcommaaccent/lcommaaccent/
/Ncommaaccent/ncommaaccent/
/Rcommaaccent/rcommaaccent/
but mapped to the ...WITH CEDILLA Unicode characters.
NB: the lowercase /gcommaaccent/ is almost always written with a variant
mark that actually sits above the letter (to avoid collision with the
descending loop); this is achieved by rotating the commaaccent mark 180
degrees and positioning it above the g. I usually include the variant
ingredient glyph /uni0312/ to use in the /gcommaaccent/ composite.
Regarding the /Scedilla/ and /Tcedilla/ vs. /Scommaaccent/ and /Tcommaaccent/:
/Scedilla/scedilla/ are used only for Turkish; this must be a true cedilla.
/Scommaaccent/scommaaccent/ and /Tcommaaccent/tcommaaccent/ are used only
for Romanian; this must be the same 'comma' diacritic form discussed above
for Latvian, and should *not* be attached to the letter.
/Tcedilla/tcedilla/ is not used for any European language (it is arguably
more appropriate for Gagauz Turkish than the 'comma' accent form, because
they also use the /Scedilla/, but GT texts I have seen all use the 'comma'
below the T and the cedilla below the S). Generally I do not include the
cedilla variant in fonts, and simply double map the /Tcommaaccent/ to the
Unicode values discussed below.
Version 3.0 of the Unicode standard, which postdates the published WGL4
set, disunified the /Scedilla/ and /Tcedilla/ from the /Scommaaccent/ and
/Tcommaaccent/ by providing new codepoints for the latter. My
recommendation is to use the new codepoints for /Scommaaccent/ but to
double map the /Tcommaccent/ glyph to the new codepoints and also to the
old /Tcedilla/ codepoint.
Note that there are text encoding issues regarding Romanian, because the
Romanian 8-bit codepages all use the old /Scedilla/ and /Tcedilla/ Unicode
codepoints, not the new codepoints for the 'comma' accent characters. In
OpenType fonts, we've addressed this (for future support) by including a
Language System tag for Romanian, and a Localised Forms <locl> feature
lookup to substitute the /Scommaccent/ glyph for the /Scedilla/. This
feature is not yet supported in any systems or applications, but I'm
reasonably certain that it will be.
John Hudson
Tiro Typeworks www.tiro.com
Vancouver, BC tiro@tiro.com
Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it. - Terry Eagleton
This archive was generated by hypermail 2.1.2 : Thu Aug 15 2002 - 00:26:14 EDT