From: Jim Allan (jallan@smrtytrek.com)
Date: Thu Nov 06 2003 - 19:52:23 EST
António Martins-Tválkin wrote:
> Anyway -- who ever decided that cedilla and undercomma are different
> things? Do they have different origins? Any language / orthography using
> both distinctly?...
I don't know whether undercomma is in origin distinct from cedilla or is
historically an adaptation of the cedilla. I *suspect* the latter.
Even given a common origins, it is debatable whether they should now be
considered the same or not. That is why there is a problem. It isn't cut
and dried.
The MARC 21 and Ansel character sets distinguished the two as CEDILLA
and LEFT HOOK (for the undercomma) though it is dubious whether the
originators of these sets knew what this "left hook" was. See
http://lcweb2.loc.gov/cocoon/codetables/45.html for current ANSEL
specifications and
http://www.niso.org/standards/resources/Z39-47-1993(R2002).pdf for 1963
table where it was notoriously given the name "LEFT HOOF".
Its identity with the undercomma is asserted at
http://www.niso.org/international/SC4/Wg1_240.pdf:
<<
5/2 HOOK TO LEFT
In ISO 5426, this character is annotated ' used in Latvian, Romanian.'
Because of this use, the most appropriate mapping is to U+0326 COMBINING
COMMA BELOW (annotated as 'variant of the following' [combining cedilla]
in the Unicode Standard).
>>
The original ISO 6429 character sets were constructed under the
philosophy that differences between cedilla and undercomma were only
stylistic. The default images in those tables and in Unicode Standard
versions 1 and 2 showed a cedilla form throughout.
However users of Latvian and Romanian insisted firmly that cedilla forms
were not historically correct for printed material in those languages.
It was *only* increasing use of fonts created outside of eastern Europe
that had caused the incorrect cedilla shape to be seen, especially as
computer technology took hold.
For Latvian (and Livonian), the problem was easily solved within
standard character sets by font designers using the undercomma character
beneath all letters except _c_ or _s_ .
However Romanian _s_ which traditionally had undercomma conflicted with
Turkish _s_ with cedilla.
The result was a Romanian proposal to add uppercase and lowercase
combined characters with undercomma for uppercase and lowercase _s_ and _t_.
See ISO/IEC JTC 1/SC 2/WG 2 N1604 (1987) at
http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/n1604.htm :
<<
*RESOLUTION M33.24 (4 Latin characters):
_Netherland Negative._*
WG 2 accepts the following four Latin characters (requested by Romania),
their names and shapes to be encoded in the BMP as follows:
0218 LATIN CAPITAL LETTER S WITH COMMA BELOW
0219 LATIN SMALL LETTER S WITH COMMA BELOW
021A LATIN CAPITAL LETTER T WITH COMMA BELOW
021B LATIN SMALL LETTER T WITH COMMA BELOW
in accordance with document N1361.
See resolution M33.26 for further processing.
>>
But Romanians are still frustrated because most fonts distributed as
part of computer operating systems or otherwise available do not support
these characters.
ISO 8859/16 (intended as a replacement for ISO 8859/2) specifically
designates undercomma rather than cedilla with _s_, _S_, _t_, _T_. See
ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-16.TXT
For the Netherlands opposition see
http://wwwold.dkuug.dk/JTC1/SC2/WG3/docs/n441.pdf .
Since there is no linguistic tradition in any language for _t_ with a
cedilla shape beneath, most modern fonts display an undercomma beneath
U+0162, U+0163 instead of a cedilla shape.
It is really only with _s_ that there are two conflicting usages.
There are actually three conflicting uses, since Gagauz traditionally
uses a cedilla shape under _c_ an undercomma beneath _t_ and a symbol
halfway between the two under _s_. See
http://www.unicode.org/mail-arch/unicode-ml/y2002-m09/0199.html
Jim Allan
This archive was generated by hypermail 2.1.5 : Thu Nov 06 2003 - 21:20:09 EST