Malayalam Half-U: how

From: Antoine LECA (
Date: Fri Nov 08 2002 - 18:29:54 EST

  • Next message: Michael Everson: "Re: A .notdef glyph"

    Hi folks,

    A problem was signaled in the Microsoft VOLT mailing list (this list
    should be dedicated to typographic, but it appears that it deals
    more with Indic scripts, because VOLT is the MS tool to use to encode
    OpenType informations in a font, which in turn is required to display
    Indic scripts on Windows.)

    The problem deals with Malayalam half-u. An user signaled as an error
    the fact that Uniscribe displays a dotted circle in the middle of a
    Malayalam half-u. He wrote
            U+0D15 U+0D41 U+0D4D (ka, u, virama)
    and Uniscribe displayed (in reformed style) the ku syllable, then a
    dotted circle, then a virama sign hanging alone.

    Of course, the problem is that Uniscribe expects virama to come only
    after consonants, so it displayed it as an error. But I believe the
    misunderstood hides a real problem: how can be displayed the half-u.
    Hence I am coming here to see what the gurus believe about this.

    To help you, I have done some researches. Here is what I have found.

    First, the phonetic reality: the root is when a word ends with halanta
    (virama); while in other languages, this "kills" the a-sound, in
    Malayalam it rather replaces it with the half-u sound, particularly
    when the consonant is a conjunct.
    This is for example described in the ISO 15919 standard, available
    with detailed explanations at Dr Anthony P. Stone site,

    According to Varamozhi (a site well informed about Malayalam),
    when it comes to representation, there exists differing writing
    "styles" contemplating this single phonetic reality; in North
    Kerala, usage is to write the halanta sign in place, and Done!
    Obviously, this is very much in line with the other scripts.

    However, in South Kerala, as Mr. Cibu said, usage is to write the
    halanta sign as well as to show the matra for the u vowel.
    While it is said that this latter usage occurs with the reformed
    style, I have seen examples with the traditional style as well
    (although this is from a book printed in Madras, so it might be wrong.)
    Obviously, the user of Uniscribe intended to display this combination,
    which to him is the normal way to display a word, when it ends with

    Knowing that, we can now notice that Unicode has a note under Malayalam
    virama (U+0D4D), saying it is the same as Malayalam half-u. To me, this
    means that under Unicode, the half-u is supposed to *not* be specifically
    encoded, and only the usage from North Kerala is supposed to be followed.

    Other relevant informations: ISCII-91 seems mute about the subject,
    and THE CDAC products (like iLeap) seems unable to render the half-u
    in Malayalam (until one "cheats" using the INV pseudo-consonant.)

    It is too late to discuss the pros and cons of the choice of Unicode,
    back in 1992 (probably, they chose to ease as far as possible the
    unification of encoding, in order to ease sorting and similar tasks.)
    Now, the problem is, if someone wants to specifically encode the
    showing of the u matra, in a context (like is Uniscribe) where both
    usages from North and South Kerala could be intended, how should it be
    done? It seems rather natural to use then the combination
                      U+0D41 U+0D4D,
    following the precedent established in Unicode 3.1 (IIRC) for the modern
    Bengali A and E initial vowels (from English borrowed words), which are
    written as Bengali A or E, followed by virama then ya (hence a exception
    to the rule virama may only follow a consonant.)

    Are the gurus here OK with this "solution"?

    Can it be "sanctified", for example with the inclusion of the adequate
    words in some revision of Unicode?

    If this is agreed, when dealing with other aspects than rendering,
    people should take in account this, and effectively ignore the U+0D41
    when followed by U+0D4D, when the task is about searching, sorting, etc.
    While this is a nuisance, it does not appear completely prohibitive to
    me. But I admit I have not think a lot about the consequences of
    allowing such "presentation encoding."


    This archive was generated by hypermail 2.1.5 : Fri Nov 08 2002 - 19:09:48 EST