Re: Clarifications on Thamizh Character Set Standardisati

Date: Mon May 29 2000 - 17:20:29 EDT

Dear Padmaxi,

This is not a complete responce to the questions that you raised.

1/ Some of the Tamil character codes are not in sequence as they ought to be
for sorting purposes. This problem can be taken care of by either additional
s/w functionalities to put this right or by actual physical relocation of
Tamil characters within Unicode definition. A joint anouncement by Unicode
Consotium and the representative associate (TN Gov) member.

2/ As it stands, your suggested sorting proposal does not apply to Tamil
This is because there are two categories of character codes found in Unicode
for Indic. To the out side world these are catogorised as chars and
ligatures. To a computer (internal world!) I define these as Primary
characters and Virtual characters. The virtual characters does not count for
any fundermental programming, such as sorting. Typical examples of Tamil
virtual characters are consonant-u, and consonant-uu. These are real and
indipendent characters in the outside world. But inside the computer these
characters are not known, except for rendering purposes.


<< Dear Friend,
     I am having some doubts to be clarified regarding the subject of Thamizh
 character set standardisation. I trust that you will reply me when you find
 free time. If you are very new to this subject, please be aware of this, and
 try to contribute your level best to one of the devine language Thamizh. I
 am listing my Ideas, suggestions and doubts in the following.
     1. I noticed that the Thamizh character ordering in both of the
 character-sets are not that much proper. If we use any of the character
 encoding for computations auch as alphabetic sorting (akara varisai) or
 index searching, we may not get the desired result. Since in UNICODE 'NNa'
 is followed by 'na' and 'ra' is followed by 'RRa' which itself is not in the
 order of thamizh alphabets. In the case of TSCII TAB glyph encoding, we
 still have some problem, like the Vowels are listed after the combination
 consonent list which is followed by the pure consonents. I think it should
 be like,
         a. Numerals
         b. Vowels & ayutham
         c. Prefix modifiers
         d. Pure consonents (ka, nga, cha...., sa, sha..) (without any
         e. NNaa, Naa, Raa
         f. Postfix modifiers
         g. Combination modifier symbol (for Ea, Eaa, O, OO, Ow) (not that
 much necessary) (found in ISCII & UNICODE)
         h. Combination Consonents, in alphabetic order
         i. Special characters and punctuations.
     If the character sequence is in the above order there will be no problem
 in the computation and word processing point of view. I wonder, why such a
 orderly convenson is not followed in both of the well analysed Standatrds. I
 am very much interested to know about this.
     2. I came to know that Thamizh Nadu Govt is one of the associate member
 of Unicode Consortium. So I am interested to know about any actions that are
 taken towards the adoptation of TSCII based character ordering in UNICODE.
     May be i am very very late, in saying these things, but I am very
 interested in this area and only came to know in detail about the
 standardisation and like things, in the recent days.
     Also I am sending you the Thamizh character order (JPG File) as an
 attachment, that what I have in my mind. May be, sometimes, i will help us
 for taking a decision towards the standardisation.
     Expecting your informative reply.
                                                    Thanking You
 Yours Sincerly
 - R.Padmakumar.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT