Re: S with comma/cedilla

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Sep 01 1998 - 13:59:54 EDT


Kevin Bracey asked:

>
> I've been trying to pin down what's going on with S comma/cedilla.
> As I recall there was confusion over whether character sets like Latin-2
> had a comma or cedilla under the S. Currently, my UCS list says
>
> U+015E LATIN CAPITAL LETTER S WITH CEDILLA
>
> and Latin-2 contains this character. Now, as I recall, the problem was
> that this had to actually have a comma below, but the misleading name and
> crummy character drawings in ISO 8859-2:1987 caused everyone, including
> Unicode, to put a cedilla underneath.
>
> The proposed allocations page has:
>
> U+0218 LATIN CAPITAL LETTER S WITH COMMA BELOW

Michael Everson provided a clarification regarding this. But I'll
add to that.

U+015E LATIN CAPITAL LETTER S WITH CEDILLA
U+015F LATIN SMALL LETTER S WITH CEDILLA

are in Unicode/10646 and have been there from the beginning. These
are the characters that map to 8859-2 Latin-2, and will be present
in data sets encoded in Latin-2. These are also the characters which
map to Windows 1254 (Turkish), Code Page 852 (Latin-2), etc. They
will be found in data which may either be Turkish or Romanian, and
you can expect that there will be different requirements on the exact
glyphic form to use for display, depending on the language and
locale involved.

U+0218 LATIN CAPITAL LETTER S WITH COMMA BELOW
U+0219 LATIN SMALL LETTER S WITH COMMA BELOW

are in Amendment 18 to 10646 (and have already been accepted for
the Unicode Standard, as well). These were added at the request
of the Romanian national body. They do not map to any existing
code pages or international 8-bit character sets, and will not
be in preexisting data. The intention was to have characters that
unambiguously take the preferred Romanian form. Amendment 18 is
almost complete in its ballotting process, so as Michael pointed
out, these two characters are "almost" in the UCS.

>
> Now, my Abobe Glyph List, which this has to agree with, says:
>
> U+015E LATIN CAPITAL LETTER S WITH CEDILLA = Scommaaccent
> U+1E9E LATIN CAPITAL LETTER S WITH COMMA BELOW = Scedilla
> U+F6C1 LATIN CAPITAL LETTER S WITH CEDILLA = Scedilla (Duplicate)

U+F6C1 is a user-defined character. U+1E9E is an unassigned (and
reserved) code point--nobody should be using it for anything.

Perhaps Adobe should clarify what it is doing on this.

>
> I'm now immensely confused. Is S WITH COMMA BELOW U+0218 or U+1E9E?
> Is it the case that Adobe has always had the rendering right, so it's
> U+015E is Scommaaccent? Should that not be changed to Scedilla? What's
> the sense in U+1E9E being Scedilla?

There are basically two glyphs: {Swithcedilla} and {Swithcommabelow}. You
can expect to find either glyph for rendering the character
U+015E LATIN CAPITAL LETTER S WITH CEDILLA. You should expect to find
only {Swithcommabelow} for
rendering the character U+0218 LATIN CAPITAL LETTER S WITH COMMA BELOW.

I hope this helps.

--Ken Whistler

>
> And then, what is character 10/10 in Latin-2? U+015E, U+0218 or U+1E9E?
> Scommaaccent or Scedilla? Help!
>
> --
> Kevin Bracey, Senior Software Engineer
> Acorn Computers Ltd Tel: +44 (0) 1223 725228
> Acorn House, 645 Newmarket Road Fax: +44 (0) 1223 725328
> Cambridge, CB5 8PB, United Kingdom WWW: http://www.acorn.co.uk/
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT