Re: Malayalam Half-U: how

From: Baiju M (
Date: Tue Nov 12 2002 - 04:32:36 EST

  • Next message: Baiju M: "Re: [smc-devel] Re: Malayalam Half-U: how"

    In Malayalam (iso639-2 language code : mal) there are 37 'vyanchanangal'
    All these consonants are usually pronounced with a support of 'swaram'
    (vowel) sound
    A [U0D06]. The pure forms of consonats is writing with a 'chandrakkala'
    (virama [U0D4D])
    above the consonant. While pronouncing the pure forms of consonants there
    should be
    clear sound of vowel U [U0D09]. Some consonants another form, which is
    called 'chillu'.
    A 'chillu' is a consonant which do not require any vowel support to
    prounce. It is writing
    with a vowel sign U [U0D41] and 'chandrakkala' (virama [U0D4D]) above
    that. Infact
    Malayalam has seperate 'lipi' (script) for 7 chillu forms of consonants
    which are widely
    using in Malayalam. Since we have seperate scripts for most of the
    chillus, in writing system
    we almost stopped writing chillu forms of other consonants (which is
    rarely occurs) as explained
    above. Eventhough still you can see some texts written in this style.
    Antoine said this
    is half form of u that is the 'samvrutokaram' of U [U0D09] (infact
    'samvrutokaram' has
    a sound of A and U, so the 'virama', 'vowel sign U' and 'combination of
    this two' is
    used in diffnerent places and texts, some lingusits says that
    'samvrutokaram' has
    a vowel value.) Now many are writing consonants with virama for chillu
    forms of other consonats
    One example is that Antoine said : U0D15 + U0D41 + U0D4D (ka, u, virama).
    So internaly a chillu can be represented with unicode character sequence
    like this :
    <consonant> + <vowel sign U [U0D41]> + <virama [U0D4D]>.
    Then you can render 7 chillu forms with correct script. I will explain how
    to do this below.
    For making inputting very easy you can use the inscript keyboard layout
    standardised by
    kerala govt. (See they just added chillus to original inscript keyboard
    layout at appropriate
    positions, they considered the frequency of occurense of this chillu
    forms. I will explain
    the drawback of this keyboard layout below.)

    The proposal for inclusion of scripts of chillus forms of consonants as
    basic characters should
    not be accepted by Unicode consortium. (This is going to be submitted (or
    already submitted?)
    by Ministry of Information Technology (Govt. of India), a member of
    Unicode consortium)
    The prosal includeds some other things, in my opinion those changes should
    be accepted.

    Now I will explain howto represent chillu forms of consonants in unicode
    An important thing to be noticed is that two (or more) consonants may have
    same script
    for their chillu forms. And its pronouciation is also same. Though it
    should be represented
    in correct unicode sequence. Script for chillu forms of both RA [U0D30]
    and RRA [U0D31]
    are same. Similary script for chillu forms of both LLA [U0D33] and LLLA
    [U0D34] are same.
    Other consonants which has chillu forms with unique scripts are NNA
    [U0D23], NA [U0D28]
    and LA [U0D32].

    Why 5 scripts of 7 chillus forms of consonants should not be included in
    unicode ?

    * The basic reason is that those 5 'lipi' (script) are not part of
    Malayalam 'Aksharamala'
       (character set). instead these are chillus only (See it is not a
       (consonanat conjunct) )

    Sopporting reasons :-

      + As I explained above two (or more) consonants is using same script for
    their chillu
         forms. So if these 'simple shapes' are going to be part of unicode
    hard encoding of
         hard encoding of chillus wll be impossible. If someone input in
    correct unicode seqence
         the renderer should render those characters, this will make more

      + Sorting rule cannot impliment effectively.

    Inscript keyboard layout problems :-

       I think the drawback of new inscript keyboard layout standardised by
    Kerala govt.
    will be clear from the above discussion. Eventhough the layout can be
    accepted with
    practical consideration. Since we are only using those scripts, we can
    any character sequence to keys allocated to them. Here the choice is
    coiming in between
    RA [U0D30] and RRA [U0D31] chillu and LLA [U0D33] and LLLA [U0D34]. By
    considering the
    accent of pronounciation and freequency of occurense of these chillus, you
    can choose
    RRA [U0D31] and LLA [U0D33]. Infact this only can be decided by cosidering
    the words.
    For example :-
    RA [U0D30] + vowel sign U [U0D41] + virama [U0D4D] is correct in words :
    neer - neere (water), avar - avare (they), aar - aare (who) etc.

    and RRA [U0D31] + vowel sign U [U0D41] + virama [U0D4D] is correct in words :
    car - caRe (car), kiNar - kiNaRe (well), sir - saRe (sir) etc.

    So if someone input the other correct sequences (without using those keys),
    it should render properly.

    P.S : please reply to

    Baiju M

    --- In unicode@y..., Antoine LECA <Antoine10646@l...> wrote:
    > Hi folks,
    > A problem was signaled in the Microsoft VOLT mailing list (this list
    > should be dedicated to typographic, but it appears that it deals
    > more with Indic scripts, because VOLT is the MS tool to use to encode
    > OpenType informations in a font, which in turn is required to display
    > Indic scripts on Windows.)
    > The problem deals with Malayalam half-u. An user signaled as an error
    > the fact that Uniscribe displays a dotted circle in the middle of a
    > Malayalam half-u. He wrote
    >         U+0D15 U+0D41 U+0D4D  (ka, u, virama)
    > and Uniscribe displayed (in reformed style) the ku syllable, then a
    > dotted circle, then a virama sign hanging alone.
    > Of course, the problem is that Uniscribe expects virama to come only
    > after consonants, so it displayed it as an error. But I believe the
    > misunderstood hides a real problem: how can be displayed the half-u.
    > Hence I am coming here to see what the gurus believe about this.
    > To help you, I have done some researches. Here is what I have found.
    > First, the phonetic reality: the root is when a word ends with halanta
    > (virama); while in other languages, this "kills" the a-sound, in
    > Malayalam it rather replaces it with the half-u sound, particularly
    > when the consonant is a conjunct.
    > This is for example described in the ISO 15919 standard, available
    > with detailed explanations at Dr Anthony P. Stone site,
    > <URL:>
    > According to Varamozhi (a site well informed about Malayalam),
    > <URL:>
    > when it comes to representation, there exists differing writing
    > "styles" contemplating this single phonetic reality; in North
    > Kerala, usage is to write the halanta sign in place, and Done!
    > Obviously, this is very much in line with the other scripts.
    > However, in South Kerala, as Mr. Cibu said, usage is to write the
    > halanta sign as well as to show the matra for the u vowel.
    > While it is said that this latter usage occurs with the reformed
    > style, I have seen examples with the traditional style as well
    > (although this is from a book printed in Madras, so it might be wrong.)
    > Obviously, the user of Uniscribe intended to display this combination,
    > which to him is the normal way to display a word, when it ends with
    > halanta!
    > Knowing that, we can now notice that Unicode has a note under Malayalam
    > virama (U+0D4D), saying it is the same as Malayalam half-u. To me, this
    > means that under Unicode, the half-u is supposed to *not* be specifically
    > encoded, and only the usage from North Kerala is supposed to be followed.
    > Other relevant informations: ISCII-91 seems mute about the subject,
    > and THE CDAC products (like iLeap) seems unable to render the half-u
    > in Malayalam (until one "cheats" using the INV pseudo-consonant.)
    > It is too late to discuss the pros and cons of the choice of Unicode,
    > back in 1992 (probably, they chose to ease as far as possible the
    > unification of encoding, in order to ease sorting and similar tasks.)
    > Now, the problem is, if someone wants to specifically encode the
    > showing of the u matra, in a context (like is Uniscribe) where both
    > usages from North and South Kerala could be intended, how should it be
    > done? It seems rather natural to use then the combination
    >                   U+0D41  U+0D4D,
    > following the precedent established in Unicode 3.1 (IIRC) for the modern
    > Bengali A and E initial vowels (from English borrowed words), which are
    > written as Bengali A or E, followed by virama then ya (hence a exception
    > to the rule virama may only follow a consonant.)
    > Are the gurus here OK with this "solution"?
    > Can it be "sanctified", for example with the inclusion of the adequate
    > words in some revision of Unicode?
    > If this is agreed, when dealing with other aspects than rendering,
    > people should take in account this, and effectively ignore the U+0D41
    > when followed by U+0D4D, when the task is about searching, sorting, etc.
    > While this is a nuisance, it does not appear completely prohibitive to
    > me. But I admit I have not think a lot about the consequences of
    > allowing such "presentation encoding."
    > Regards,
    > Antoine

    This archive was generated by hypermail 2.1.5 : Tue Nov 12 2002 - 05:28:59 EST