RE: unicode and malayalam

From: Marco.Cimarosti@icl.com
Date: Thu Dec 02 1999 - 14:59:06 EST


Raj is from Kerala and, as could be expected, he seems to be right about his
own language.

I think that, in Malayalam, there is no such a thing as the "half
consonants", like in Devanagari or Gurmukhi.

So it is not very clear what is the rendering implication of ZWJ in
Malayalam. Probably, for this script, ZWJ should simply have no visible
effect. (It is a little bit more clear what ZWNJ should be for: to prevent
ligatures and explicitly show the virama).

Mark is right saying that the rendering of Indic (or any other) scripts is
mostly a function of the font and/or of the rendering engine. Yet, concepts
like ZWJ, ZWNJ and, to some extent, the VIRAMA are NOT actual features of
the Indic scripts: they are abstract concepts introduced by Unicode (or,
well, inherited from ISCII). So it should be the task of the Unicode to
document in some more detail how these controls should contribute to encode
text in ALL these script, and how this encoded text is supposed to be
rendered.

>> In fact, I feel that the Unicode book (2.0, I mean) is a little bit short
with information about Indic scripts other than Devanagari and Tamil. I
would prompt Unicode to enhance, in next versions of the book, the sections
about these non trivial scripts (talking about pages 6-45 to 6-48 and 6-54
6-56, version 2.0) <<

This does not mean that Unicode should define all the tiny bits of how
binary Unicode maps to displayed or printed output in Indic languages: this
would not be possible, and would violate the freedom of developers and font
designers.

However, if a native speaker like Raj does not understand some aspects of
how his own language is encoded, it is clearly not because he does not know
the script (!), but rather because something is not clear with the encoding.

I would like to point out another fact that, in my mind, is not clear enough
with Unicode and Indic alphabets.

The Devanagari section of the Unicode book (2.0) explains in some detail how
"dead consonants" (i.e. consonant+<VIRAMA>) are often displayed as "half
consonants" (i.e. special glyphs that are missing the right-most part of the
letter shape: typically without the vertical bar).

Simplifying, when in Devanagari you have a sequence like <MA><VIRAMA><MA>,
the second <MA> is displayed with its nominal ("full") glyph, while the
first <MA> and the <VIRAMA> form a ligature and are displayed with the
special "half" glyph:

        Devanagari: backstore <MA><VIRAMA><MA> -> display
<MA-half><MA-full>

However, many Indic scripts (e.g. Telugu) work exactly the opposite way: the
first <MA> is displayed whith its nominal ("full") glyph, while the <VIRAMA>
and the second <MA> form a ligature and are displayed with a special
"subscript" glyph. "Subscript consonants" are combining glyphs that kern
*below* the preceding glyph (like a cedille); they look similar to the
normal glyphs but are smaller in size and, sometimes, miss the top part of
the letter shape:

        Telugu: backstore <MA><VIRAMA><MA> -> display
<MA-full><MA-combining-subscript>

This feature, that is common for about half of the Indic alphabets is
totally ignored in the Unicode book (2.0, again). This omission is not
friendly to readers (it could mislead non-Indian developers who, because of
this, write wrong or insufficient rendereing engines), but Unicode could
say: "Hey, it's not our business to teach you Telugu: go out and buy a
grammar book.".

OK, right, everybody should do their homework. But the evil thing is that
the Unicode book does not investigate and explain the effects of controls
like ZWJ and ZWNJ on these alphabets using "subscript consonants"; and this
effect is not obvious; and it cannot be "the same as in Devanagari"; and I
cannot find such an information on a grammar book!

Ciao.
        Marco

> -----Original Message-----
> From: Mark Leisher [SMTP:mleisher@crl.nmsu.edu]
> Sent: 1999 December 02, Thursday 18.53
> To: Unicode List
> Cc: Unicode List
> Subject: Re: unicode and malayalam
>
>
> RajKumar> hello all, i have some confusion regarding the ZWJ ZWNJ and
> RajKumar> their use wrt malayalam.
>
> RajKumar> 1. as far as i know half consonents are absent in malayalam.
> so
> RajKumar> what is the effect of a ZWJ in between two consonents.
> insted of
> RajKumar> the half forms we normally use the virama between two
> consonents
> RajKumar> that do not have a seperate glyph, and this can be done by
> using
> RajKumar> the ZWNJ
>
> RajKumar> 2. In malayalam their can be two equally valid ways of
> RajKumar> representing the glyphs of some consonent conbination. like
> RajKumar> CA+CA. is their any way of identifying the exact glyph form.
>
> As far as I know, Malayalam should work much as Devanagari does. The
> virama
> inhibits the vowel, and a following ZWJ or another consonant will cause it
> to
> take half-consonant form.
>
> But in cases where you have Consonant+Virama+Consonant, *before* selecting
> the
> half-consonant form, the rendering code is responsible for determining if
> this
> pair (or group in some cases) of consonants should be combined to form a
> vertical conjunct or ligature.
>
> There is no "standard" way of determining which glyph to use. It is
> entirely
> dependent on what glyphs are available in the font and how much the
> rendering
> code knows about the glyphs in the font.
> --------------------------------------------------------------------------
> ---
> Mark Leisher
> Computing Research Lab I have never made but one prayer to God,
> New Mexico State University a very short one:
> Box 30001, Dept. 3CRL "Oh Lord, make my enemies
> ridiculous."
> Las Cruces, NM 88003 And God granted it. -- Voltaire, letter



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT