Re: Merging combining classes, was: New contribution N2676

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Oct 29 2003 - 14:33:47 CST


----- Original Message -----
From: "John Hudson" <tiro@tiro.com>
To: <kentk@cs.chalmers.se>
Cc: "'Jim Allan'" <jallan@smrtytrek.com>; <unicode@unicode.org>
Sent: Wednesday, October 29, 2003 6:15 PM
Subject: RE: Merging combining classes, was: New contribution N2676

> At 04:04 AM 10/29/2003, Kent Karlsson wrote:
>
> >The Latvian "cedillas" are really commas below, and are best encoded so.
> >Still for lowercase g (not for uppercase) the comma below is _rendered_
> >as a turned comma above.
>
> The 'not for uppercase' rule depends on the design of the uppercase
letter.
> Typically, there is no descending portion, so the 'comma' accent goes
> below; in some handwriting typefaces and with swash letters, the G may
have
> a descending stroke. In this case the accent is turned and placed above,
> just as it is for the lowercase. Of course, it is encoded as the comma
> below. The attached examples are from the version of Hermann Zapf's
Zapfino
> that ships with Apple's OS X.

So Latvian "cedillas" (as well as Romanian) should be encoded with the comma
below and not the cedilla. But there's a huge legacy use of characters with
cedilla instead of comma below, due to the good support for Turkish, and the
long history of bad support for the comma below.

A common example is the default ANSI charset of Windows for Latvian and
Romanian, which simply does not have the comma-below, but force users to
encode characters with cedillas...
This has forced users to create custom fonts to create glyph variants of
characters coded with a cedilla, but rendered as a comma below...

Even today, it is quite hard to find any Romanian or Latvian web page using
the new Unicode characters with a comma-below: even governmental sites use
the characters coded with the cedilla, and they support that this comma
below is rendered approximately, as this does not cause interpretation
problems for readers. For these countries, document writers are choosing
between the Central European or Turkish ISO charsets, and they avoid using
commas below as they are not rendered at all (or displayed with a missing
square glyph) on most platforms...

For example, on Windows, the comma below is most often supported only if
users have installed MS Office that includes the "Arial Unicode MS" font
capable of displaying it. When Microsoft will offer as a free download this
font to all Internet Explorer users, there will be much less problems, and
we'll probably see more texts encoded correctly with the comma-below.

May be we could militate here so that Microsoft includes at least the
characters for Latvian and Romanian (at least the precomposed characters,
even if a decomposed comma-below is not rendered correctly) in a update of
its "Times New Roman", "Arial", "Verdana" and "Tahoma" fonts for the web...



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST