From: Bala (bala@cse.mrt.ac.lk)
Date: Mon Nov 05 2007 - 02:30:13 CST
Dear James Kass,
Thank you very much for the detailed reply
We are in the process of defining the standards for TAMIL (தமிழ்) CHARACTER CODE FOR INFORMATION INTERCHANGE in Sri Lanka. I would like to clarify few things as summary.
(1)
In summary Unicode wanted to encode the Grantha Shri in Tamil following way (image1)
0BB6 + 0BCD + 0BB0 + 0BC0
ஶ + ் + ர + ீ
No more old way is suggested in the future encoding
0BB8 + 0BCD + 0BB0 + 0BC0
(2)
What will be encoding for image2?
(3)
Since each new sequence added to the Unicode standard will "break" existing data, The Unicode wanted to keep the encoding to the Grantha க்ஷ in Tamil following way.
0B95 + 0BCD + 0BB7
க + ் + ஷ
Again thank you very much for your replays
Kind regards
Bala
-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of James Kass
Sent: Monday, November 05, 2007 12:28 PM
To: 'Unicode Mailing List'; indic@unicode.org
Subject: Re: Tamil Sri / Shri
----- Original Message -----
From: "Bala" <bala@cse.mrt.ac.lk>
>> But it appears that one ligature uses RA and the other uses RRA.
>>
>> U+0BB0 ர TAMIL LETTER RA
>> U+0BB1 ற TAMIL LETTER RRA
>
> Feel very uncomfortable when the Tamil letters were taken (ர, ற) to
> the discussion for the formation of sri/shri. ஶ or ஸ are Grantha
> letters. This is like two different scripts elements were forming a
> grammatical form.
It is my impression that all of the Tamil letters have their counterparts
in Grantha writing. If this is so, then I wonder if these old-fashioned
letters are based on the SHA plus the Grantha equivalents of TAMIL
LETTER RA and TAMIL LETTER RRA.
Perhaps we can all agree that the ISCII model was an unfortunate choice
from the viewpoint of many modern Tamil users.
Unicode defines a sequence for the "shrii" glyph in Tamil Unicode text.
As evidence of more old-fashioned letters like these becomes available,
new sequences will probably be added to the standard.
Each new sequence added to the standard will "break" existing data for
those Tamil users who do not wish to see those old-fashioned letters.
This is because the "virama model" which came from ISCII requires that
the Indic script "conjuncts" be formed as part of the default condition
of text.
When users want to block formation of "conjuncts", the user must
enter a special formatting character. (In this case U+200C : ZERO WIDTH
NON-JOINER.) Since users in the past probably didn't expect that any
new "conjunct" sequences would be added to the standard, they would
not have been able to predict where to put any of those U+200C
characters in their texts.
Although we may all agree that a choice made in the past was unfortunate,
we should not "live in the past" and we can not change the past. We must
live in the present and we may look toward the future.
There are many Tamil computing professionals who are eminently qualified
to make good, practical input methods and other software applications.
It would be helpful if Tamil (and Grantha and Tamil Grantha) scholars could
make a listing of forms which are needed to represent historic Tamil texts.
This would serve as a basis for education of those Tamil computing
professionals, as well as the rest of us. With this knowledge, input method
programmers could devise solid, workable solutions to ensure that
old-fashioned letters would not appear if the author of a document does not
want them to appear, while preserving the option of displaying those
letters if an author *does* want them to appear.
With a complete listing, programmers could design input methods to
contextually and automatically insert the special NON-JOINER formatting
character, wherever it is needed, based on user preference.
Without a complete listing, there are bound to be unpleasant surprises.
Best regards,
James Kass
This archive was generated by hypermail 2.1.5 : Mon Nov 05 2007 - 02:32:49 CST