Re: Latin g with cedilla above

From: Michael Everson (everson@indigo.ie)
Date: Fri Oct 02 1998 - 04:15:12 EDT


Ar 17:00 -0700 1998-10-01, scríobh Kenneth Whistler:

>It you want to see them, go to the Unicode Standard, Version 1.0, page
>180, where they are printed larger than life:
>
>g with turned comma above (the preferred form)
>g with cedilla below

(which Latvians hate and no one uses; this is a GHOST glyph)

>g with acute above

(which is a typographical error for the turned comma above)

>For technical reasons, the Unicode Standard, Version 2.0 stopped printing
>multiple glyphs in a single code cell.

But rumour has it that Unicode 3.0 will NOT show the correct Latvian glyph,
but the bogus cedillified glyph, which will do nothing but encourage font
makers to make crap glyphs the Latvians can't use, further stimulating the
thriving Latvian pirated font industry, which exists because no one ever
listened.

(Note the market relevance.)

>Memory did not serve, but I concur that the second two glyph forms would
>be seen by Latvians as errors. It would be better for Unicode 2.0 to show
>the preferred Latvian glyph, as that is the most commonly occurring.

Tell the President of AFII. You needn't convince me.....

>> As is the note in the Unicode Standard. And the normative mapping should be
>> to COMBINING COMMA BELOW. Never mind the bloody name.
>
>Not *mapping*, but *decomposition*. And in this case, as for all of
>these hook-to-the-left-below's for Latin letters, there is really
>no clear, hard and fast distinction between the cedilla and the comma
>below.

Golly, I couldn't disagree more. Despite the wide availablility of
incorrect fonts (which does lead to people getting used to the bad glyphs),
little Turkish children are shown cedillas in their ABC books and little
Romanian children are shown commas below. The ur-characters are NOT the
same. (And yes, I have seen JvW's newspapers.)

We keep having this argument because of 8-bit technology. Space was
precious so these characters (notionally the UCS' COMBINING CEDILLA and
COMBINING COMMA BELOW) were unified.

>At this point, it my opinion, it would be *more* pernicious
>to change the decomposition for this character than to simply document
>the fact that this is one of the collection of cedillacomma below
>Latin characters that have multiple glyph shapes. Come on, folks,
>this is easy compared to Indic scripts.

Is it? It sounds as though you wish to perpetuate the 8-bit error and its
sad legacy. Or is the magic of smart fonts (tools for which aren't
available yet to me) and keyboards supposed to fix it?

>> > As is the note in the Unicode Standard. And the normative mapping
>>should be
>> > to COMBINING COMMA BELOW. Never mind the bloody name.
>>
>> Or rather COMBINING TURNED COMMA ABOVE, I presume.
>>
>Absolutely not. Decompositions are done for the *characters*, not for
>the *glyphs*. And for Latin uppercase/lowercase pairs, it is an
>ironclad rule that a canonical decompositions must involve the same sequence
>of combining marks. This is because a casing operation which operates
>on the baseform letter of the combining character sequence should produce
>the correct results for the entire sequence. You cannot produce a
>decomposition for U+0123 based on its glyph appearance, independently of
>consideration of its uppercase form U+0122.

I stand corrected. The Latvians would use COMBINING COMMA BELOW with both g
and G.

--
Michael Everson, Everson Gunn Teoranta ** http://www.indigo.ie/egt
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Guthán: +353 1 478-2597 ** Facsa: +353 1 478-2597 (by arrangement)
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT