Re: Counting Devanagari Aksharas

From: Richard Wordingham via Unicode <>
Date: Thu, 20 Apr 2017 08:49:49 +0100

I was offered the following reply:

> To my knowledge except in Tamil script vowel less consonants in
> written form aren't considered as separate "akshara"s in native
> terminology.

Word-finally they seem to be being treated as such. To be more
precise, a final cluster of one or more consonants marked as having no
vowel is - Sanskrit has a few word-final clusters.

> However for text shaping purposes they will surely have
> to be considered as separate orthographic syllables in Unicode
> terminology since in word end position they can sometimes carry svara
> markers.

The complication comes word internally. My understanding is that
phonetically syllable-final consonants in non-Indic words in
non-Indic languages have a tendency not to be included in an akshara
along with the start of the next syllable. However, that tendency is
more evident in scripts other than Devanagari; Devanagari has developed
in the context of Indic languages.

Renderers' syllable-recognition algorithms will naturally treat
word-final devowelled sequences as separate units, rather than
associate them with the previous implicit or explict vowel.

Burmese is a good example of what can happen with a non-Indic language;
in native words, phonetic syllabic boundaries tend to be orthographic
syllable boundaries.

Text-shaping engines like Microsoft's Uniscribe are more complicated.
For scripts with a virama, they seem to assume that the virama may be
a combining operator, and wait for data from the font to decide how
many clusters to form.

One test is the insertion of white spaces in a word when it is stretched
out. Of course, that test can only be applied where human decisions
are involved - otherwise we are just looking at what dominant
renderers are actually doing, rather than looking at what they ought
to be doing.

Received on Thu Apr 20 2017 - 02:51:09 CDT

This archive was generated by hypermail 2.2.0 : Thu Apr 20 2017 - 02:51:11 CDT