Counting Devanagari Aksharas
Manish Goregaokar via Unicode
unicode at unicode.org
Thu Apr 20 13:17:05 CDT 2017
I don't think there's consensus.
When given a rendered representation people seem to uniformly count
conjuncts as multiple aksharas if rendered with visible halant, and as
a single akshara if they are rendered conjoined.
Most fonts for devanagari these days are pretty good at conjoining
consonants. They seem to do so for all common conjuncts, and usually
for most practical (i.e. not ridiculously long) conjuncts. I've never
seen a visible halant in text I've read.
I'm of the opinion that Unicode should start considering devanagari
(and possibly other indic) consonant clusters as single extended
grapheme clusters. Yes, sometimes it's not rendered as a single glyph,
but sometimes family emoji will not render as a single glyph either
(if you use skin tones or more than 4 family members) and we still
consider those EGCs.
On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode
<unicode at unicode.org> wrote:
> Is there consensus on how to count aksharas in the Devanagari script?
> The doubts I have relate to a visible halant in orthographic syllables
> other than the first.
> For example, according to 'Devanagari VIP Team Issues Report'
> http://www.unicode.org/L2/L2011/11370-devanagari-vip-issues.pdf, a
> derived form from Nepali श्रीमान् should be written श्रीमान्को
> <U+0936 DEVANAGARI LETTER SHA, U+094D DEVANAGARI SIGN VIRAMA, U+0930
> DEVANAGARI LETTER RA, U+0940 DEVANAGARI VOWEL SIGN II, U+092E
> DEVANAGARI LETTER MA, U+093E DEVANAGARI VOWEL SIGN AA, U+0928
> DEVANAGARI LETTER NA, U+094D, U+200C ZERO WIDTH NON-JOINER, U+0915
> DEVANAGARI LETTER KA, U+094B DEVANAGARI VOWEL SIGN O> and not
> श्रीमान्को <U+0936, U+094D, U+0930, U+0940, U+092E, U+093E, U+0928,
> U+094D, U+0915, U+094B>. Now, if the font used has a conjunct for
> SHRA, I would count the former as having 4 aksharas SH.RII, MAA, N, KO
> and the latter as having 3 aksharas SH.RII, MAA, N.KO.
> If the font leads to the use of a visible halant instead of the vattu
> conjunct SH.RA, as happens when I view this email, would there then be
> 5 and 4 aksharas respectively? A further complication is that the font
> chosen treats what looks like SH, RA as a conjunct; the vowel I appears
> to the left of SH when added after RA (श्रि).
More information about the Unicode