Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

From: Richard Wordingham via Unicode <unicode_at_unicode.org>
Date: Sat, 9 Dec 2017 20:30:17 +0000

On Sat, 9 Dec 2017 16:16:44 +0100
Mark Davis ☕️ via Unicode <unicode_at_unicode.org> wrote:

> 1. You make a good point about the GB9c. It should probably instead be
> something like:
>
> GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant
>
>
> Extend is a broader than necessary, and there are a few items that
> have ccc!=0 but not gcb=extend. But all of those look to be
> degenerate cases.

Something *like*.

Gcb=Extend includes ZWNJ and U+0D02 MALAYALAM SIGN ANUSVARA. I believe
these both prevent a preceding candrakkala from extending an akshara -
see TUS Section 12.9 about Table 12-33. I think Extend will have to be
split between starters and non-starters.

I believe there is a problem with the first two examples in Table
12-33. If one suffixed <U+0D15 MALAYALAM LETTER KA, U+0D3E MALAYALAM
VOWEL SIGN AA> to the first two examples, yielding *പാലു്കാ and
*എ്ന്നാകാ, one would have three Malayalam aksharas, not two extended
grapheme clusters as the proposed rules would say. This is different to
Tai Tham, where there would indeed just be two aksharas in each word,
albit odd-looking - ᨷᩤᩃᩩ᩠ᨠᩣ and ᩑ᩠ᨶ᩠ᨶᩣᨠᩣ. Who's checking the impact of
these changes on Malayalam?

Richard.
Received on Sat Dec 09 2017 - 14:30:50 CST

This archive was generated by hypermail 2.2.0 : Sat Dec 09 2017 - 14:30:50 CST