Re: Latvian and Marshallese Ad Hoc Report (cedilla and comma below)

From: Denis Jacquerye <>
Date: Thu, 4 Jul 2013 08:43:12 +0100

On Thu, Jul 4, 2013 at 2:42 AM, Lisa Moore <> wrote:
>> > And it's a pretty easy guess that there are quite a few more users
>> with Japanese and Chinese filenames in the same file system than
>> users with Latvian and Marshallese filenames in the same file
>> system, both because both Chinese and Japanese are used by many more
>> people than Latvian or Marshallese and because China and Japan are
>> much closer than Latvia and the Marshall Islands.
>> I oppose language-tagging as a mechanism to fix the cock-up of
>> slavishly following 8859 decomposition for cedilla and comma-below.
>> Character encoding is the better way to deal with this.
>> Michael Everson *
> I agree with Michael on this. We have a problem that is a bit more
> complicated than which fonts are used. The Unicode Standard, quite some
> time ago, began to explicitly follow 8859 legacy practice for representing
> Latvian letters such that an n with cedilla would be represented as an n
> with comma below. To paraphrase Michael, we do have a cock-up on our hands.
> Language tagging has not been a viable solution for smaller user
> communities, as they are just not well-supported.

Then the solution is the have language tagging well-supported, or we
might be heading for a cock-up by trying to fix another.

No evidence was presented, no Marshallese expert was cited to show
that these characters with the reference glyphs proposed are the only
acceptable ones or even the preferred ones.
The Marshallese-English Dictionary of 1976 uses the m, n with cedilla
attached on the rightmost stem in the Introduction section but
centered in the Dictionary section.
Its orthography rules, adopted by the RMI Ministry of education in the
1990s and made official since the Marshallese Language Orthography
(Standard Spelling) Act of 2010, are based on recommendations of a
committe of Marshallese leaders of 1971 which recommended the use "a
mark below" (not named, no shape or more specific position).
We can assume the obvious, that the mark is a cedilla, but there are
no definition of what that cedilla must look like or where it must
attach. Many languages allow for various forms of cedilla, from
classic to half-ring to comma-like to tick-like, detached or attached,
which are still interpreted as that mark people read as cedilla.

Romanian and Latvian orthographies used the classic cedilla initially
(1800s for Romanian and early 1900s for both) and shifted to the comma
below during the 1900s, it is only later that the classic form became
unacceptable, with a clear decision from Romanian and Latvian
entities, through character change and legislation for Romanian and
preferred glyph change for Latvian.
No evidence of such a decision from Marshallese entities has been provided.

There are plenty of Marshallese documents using U+0327 with adequate
fonts (meaning where it has consistently the same shape), like on, in Bible translations, on and others, and some
documents using the same character with inadequate fonts (for example
mixing comma-like cedilla and classic cedilla).
This actually shows that, given the right font, this is a non issue.
There is no evidence that the documents using the comma-like cedilla
under all Marshallese letters with mark below are wrong.
If in the future, Marshallese usage shifts to comma-like cedilla (as
in handwritting) we could have n with cedilla (with a comma-like
cedilla) and n with invariant cedilla (with a comma-like cedilla).
There is no guarantee either that U+0327 will have the right shape
(consistent with that of proposed additional characters) under m and
The position of the cedilla also varies in Marshallese documents.
The Marshallese-English Online Dictionary is using the dot below
because: "the cedilla is not yet available in Unicode for every letter
needed. The recommendation of the original Committee on Spelling
Marshallese was that some mark (not necessarily a cedilla) should be
placed beneath heavy consonants to distinguish them from the light
varieties, for example. Thus a dot can serve this purpose equally
well." The same reason is given for not using n-macron but n-tilde
This means the main issue is the positioning of the cedilla under l,
m, n, o, the shape of it under l and n is not mentionned anywhere in
MOD. Adding new characters will not solve the problem of the
positioning of cedilla under m or o for Marshallese.

The OFF standard is currently being updated and will include
Marshallese as a possible language tag. This means font developers can
chose which default glyphs those characters can have and provide
language or style specific glyphs. Just as they would for similar
case, Chinese and Japanese, Russian and Bulgarian, etc. The
Marshallese community is already using adequate fonts, language
tagging solutions would be a plus, not the only solution.

We are arguing all this based on assumptions, ignoring the solutions
that currently exist or that will in the near future.
Could we make sure this is what the Marshallese community wants?
Can we make sure the shape and positions are the preferred ones?

Denis Moyogo Jacquerye
African Network for Localisation
Nkótá ya Kongó míbalé ---
DejaVu fonts ---
Received on Thu Jul 04 2013 - 02:49:34 CDT

This archive was generated by hypermail 2.2.0 : Thu Jul 04 2013 - 02:49:36 CDT