Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)

From: Michael Everson <>
Date: Fri, 21 Jun 2013 08:39:10 +0100

On 21 Jun 2013, at 07:01, Denis Jacquerye <> wrote:

> About positioning:
> Michael, you mentioned the issue of positioning of the diacritic, this
> is a font issue not a character issue. I mentioned Navajo ogonek
> because that is how it solves the issue of positioning, custom Navajo
> fonts have centered ogoneks.

Really? To the point that Navajo users reject fonts where the ogoneks are a bit further to the right, as for Polish and Lithuanian?

> About the shape:
> As mentioned the issue of the shape can be solved at the locale
> setting and font level.

As both I and Asmus pointed out, that's not reliable.

> If this needs to be solvable at the character level, using <CGJ,
> combining cedilla> already works, font shapers do not replace the
> glyph.

I don't think it's wise to suggest to 40,000 Marshallese including school children that they should have to formalize the use of an invisible (and ignorable?) character in their orthography.

> Even if <CGJ, combining dieresis> was not intended to indicate shape
> or positioning difference, a font can easily have different shape or
> positioning for it. For <CGJ, combining cedilla>, the CGJ already
> prevents normalization to what most fonts have as a comma below glyph.

Yeah, and I might do something like that locally for fine typography (though I'd probably just ligate) but this is a practical orthography for a national language.

> The conclusion of the ad hoc does not really solve the Marshallese
> problem

Of course it does. It provides characters which will have the cedilla-shape rather than the comma-shape used in Latvian.

> and doesn't even consider the known related cases.

Perhaps not, but I don't know how worried we need to get about ALA-LC romanization display. Do ŗ and r̦ contrast in ALA-LA romanization?

> A font can have a truncated cedilla shape for U+0327, this is
> acceptable or tolerable in some languages like under C in French. If 4
> additional characters were to be encoded for Marshallese, the other
> Marshallese characters with cedilla would use an incorrect cedilla in
> those fonts.

I don't follow you. There's a workaround Marshallese orthography which uses "dot below" in order to avoid the "comma/cedilla" problem. The problem is that al the letters should have the SAME shape below, not two commas and two cedillas. I am sure that in a script or handwriting font a simple "tick" or "truncated cedilla shape" as you call it is acceptable. What's not is that the four letters have differently shaped diacritics.

> One might say these fonts should be fixed for Marshallese characters
> with U+0327 or a different font should be used, but then the character
> fix is incomplete.
> Other orthographies or transliterations also require a cedilla that
> looks like one.

Which ones are you talking about now?

Look, the right way to fix this would have been to map 8859 to 10646 for Latvian CORRECTLY to comma below instead of to map it MECHANICALLY according to the misnomer character names. But that could have been done in 1990 or 1991. It is too late to do that now, and too late to tinker with Latvian data.

The ad-hoc discussed different scenarios. The one we came up with seemed best. If you have something specific and better, write it up. Do not mention "other orthographies or transliterations" without being precise and giving exact examples. Win the argument with a WG2/L2 document with examples and argumentation. If you make a good case, people will believe you.

In the meantime I have an action to write a document requesting four letters for Marshallese.

Michael Everson *
Received on Fri Jun 21 2013 - 02:41:50 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 21 2013 - 02:41:51 CDT