RE: Bangla: [ZWJ], [VIRAMA] and CV sequences

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Wed Oct 08 2003 - 11:45:11 CST


Gautam Sengupta wrote:
> I am no programmer, but surely the rendering engine
> could be tweaked to display a halant/hashant in the
> aforementioned situations? I understand that it won't
> happen *automatically* if we were to use <ZWJ> instead
> of <VIRAMA>. But if you were to take the trouble to do
> the tweaking, you'd then have a completely *intuitive*
> encodings for vowel yaphala sequences,
> <vowel><ZWJ><Y>, instead of oddities like
> <vowel><VIRAMA><Y>.

OK but, then, your <ZWJ> becomes exactly what Unicode's <VIRAMA> has always
been: a character that is normally invisible, because it merges in a
ligature with adjacent characters, but occasionally becomes visible when a
font does not have a glyph for that combination.

But there is one detail which makes your approach much more complicated:
what we have been calling <VIRAMA> is *not* a single character. Every Indic
script has its own: <DEVANAGARI SIGN VIRAMA>, <BENGALI SIGN VIRAMA>, and so
on.

Each one of these characters, when displayed visibly, has a distinct glyph:
a Bangla hashant is a small "/" under the letter, a Tamil virama is a dot
over the letter, etc.

With your approach, the single character <ZWJ> is overloaded with a dozen
different glyphs depending on which script the adjacent letters belong to.
Plus, it still has to be invisible when used in a non-Indic script, such as
Arabic.

Implementing all this is certainly possible, but would result in bigger
look-up tables, for no advantage at all.

> Perhaps there isn't a *problem* as such, and perhaps
> naturalness and intuitive acceptability aren't *key*
> features of the system, but surely other factors being
> equal they ought be taken into consideration in
> choosing one method of encoding over another?

Yes. But the flaws that I see in ISCII/Unicode model are much smaller than
you imply. E.g., I agree that it would have been more logic if:

- independent and dependent vowels were the same characters;

- each script was encoded in its natural alphabetical order;

- there were no precomposed and decomposed alternatives for the same
graphemes.

And others, on which perhaps a linguist won't agree, but which would have
made life much easier to programmers:

- all vowels were encoded in visual order, so that vowel reordering was
necessary;

- "repha ra" were encoded as a separate characters, so that no reordering at
all was necessary.

But, all summed up, leaving with these little flaws is *much* simpler than
trying to change the rules of a standard a dozen years after people started
implementing it.

_ Marco



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST