Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)

From: Michael Everson <>
Date: Fri, 21 Jun 2013 14:27:38 +0100

On 21 Jun 2013, at 14:06, Denis Jacquerye <> wrote:

> It is not the character model that is not reliable, it is the application.
> If you application doesn't support locale settings and locale specific
> font features, fix the application.

Try this in the file system.

>> I don't think it's wise to suggest to 40,000 Marshallese including school children that they should have to formalize the use of an invisible (and ignorable?) character in their orthography.
> This is getting confusing. Do you care about keyboard layout
> implementation or not?

Sure. I just don't think that Marshallese data should be festooned with what would be a high-frequency invisible character.

> Keyboard layouts can (or should) handle 4 additional characters and
> O/o, M/m with combining cedilla. There is no reason why they couldn't
> handle characters with CGJ and combining cedilla, be it with dead
> keys, single keys or whatever.

I know how to do it.

> The user just needs to know they are using a Marshallese keyboard (which they need anyway) or copy and paste those characters from a reference document, everything else can happen at another level.

Yes, and it can be done pretty simply. But I don't think that CGJ should be made a part of Marshallese orthography.

>> Perhaps not, but I don't know how worried we need to get about ALA-LC romanization display. Do ŗ and r̦ contrast in ALA-LA romanization?
> I answered already, as Marshallese it doesn't. But that doesn't mean
> one would not want to have a word or name with comma below next to
> another word or name with cedilla. Imagine a Marshallese text talking
> about Latvian people,

No problem with the new L and N added.

> or a library reference with a romanized Bulgarian Romani name using r̦ and a Latvian name using ŗ.

That is a place where you might find CGJ useful. Not in a practical orthography for a natural language though.

>> I don't follow you. There's a workaround Marshallese orthography which uses "dot below" in order to avoid the "comma/cedilla" problem. The problem is that al the letters should have the SAME shape below, not two commas and two cedillas. I am sure that in a script or handwriting font a simple "tick" or "truncated cedilla shape" as you call it is acceptable. What's not is that the four letters have differently shaped diacritics.
> Trebuchet is not a script or handwriting font and has a "curved tick"
> cedilla and a "curved tick" comma below. The assertion that this is acceptable is what lead us here in the first place.

I've just typed out ÇçḐḑĢģĶķĻļM̧m̧ŅņO̧o̧ŖŗŞşŢţ ȘșȚț in Trebuchet MS and I'd say that it's simply badly designed for any language. When combining cedilla is added to MmOo its shape is different than for the Latvian or French character; its D-cedilla and d-cedilla have a square-headed comma; and its Romanian ȘșȚț and Latvian ģ have round-headed commas. The answer here is that this font is already badly designed as far as this feature goes, and it needs to be fixed.

>> Which ones are you talking about now?
> I already mentioned the transliterations using the cedilla already. For orthographies, the General Alphabet of Cameroon Languages uses the cedilla under vowels.

These should not be affected by the Latvian legacy though. I see what you are saying, however, about the representation of Dd, Gg, Kk, and Rr with a proper cedilla; cf. Everson Mono:

It might be suggested to encode these plus Ll and Nn as LATIN LETTER BLAH WITH INVARIANT CEDILLA, rather than to call them MARSHALLESE. Is that what you are getting at?

Michael Everson *
Received on Fri Jun 21 2013 - 08:30:42 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 21 2013 - 08:30:42 CDT