Re: Tamil and Unicode

From: Jeroen Hellingman (jehe@kabelfoon.nl)
Date: Sun Jun 06 1999 - 04:18:11 EDT


>The drawback in the delay is because the allocated code positions are not
>sufficient for the professional use of Tamil script. Lots of enthusiasts
are
>working on ways to fudge the situation. Tamilnadu Government may also be
>working on this issue.

Although some obscure Tamil characters and some symbols are missing from
the Unicode block for Tamil, the repetoire encoded in Unicode is sufficient
for
encoding all normal Tamil texts. What is missing from Unicode are characters
for
all possible ligatures of consonants and vowel signs, special variants of
vowel signs, and
the conjunct ksa. This is intentional, as these can be encoded as
combinations of
the basic characters, and software will automatically select the right
glyphs. In Tamil,
like other Indian scripts, and unlike English, there is no one-on-one
relationship between
the characters and the glyphs displayed. Although this complicates
presentation
software, it very much simplifies almost any other application, uncluding
changing
from one font to another.

>Further, there is a possibility for phonomic voice recognition to be
>implemented by using the principles of Tamil scripting. This can then be
>transported to other complex scripting systems in the second phase. If my
>understanding is correct, serious cosiderations for a solution to phonomic
>form of voice recognition (at present) is non existent.

This escapes me, although Tamil writing is highly phonetic, I do not see
how this can help in voice recognition, even less how the complex rules of
typographic composition can help. I think it is better to keep phonetics and
character encoding separate. We are not encoding the sounds of languages,
but
the symbols used in their scripts. They can be far apart, as in Chinese,
English or Tibetan,
or more systematically related as in Tamil or IPA, but they are still two
different
things.

>In view of the above and many other considerations,
>I would like to request comments from the consortium for the following
>proposal.
>
>1/ Code positions allocated to a group of languages to be amalgamated and
by
>using some elected code positions (within this codes, if necessary) the
>component languages to be identified. As a starting point, Tamil, Telungu,
>Karnadaga and Malayalam may be amalgamated.

Unification of all Indic scripts probably has been proposed several times,
as
all the alphabets have very much in common, and things like instant
transliteration by changing a font would be very easy for many users, who
may be able to understand several
Indian languages, but only able to read one script, like for example, a
Bengali
wishing to read Oriya, or a Tamil wishing to read Malayalam or Kannada (one
for Avery, can I do this in Office 2000?). However, the fine details of each
script differ, each having some peculiarities of its own, which makes table
driven transliteration necessary, and unification less desirable. As the
current ISCII implentation shows, it can be done, but I wouldn't advise it.

Jeroen Hellingman



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT