From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sat Jun 25 2005 - 19:46:36 CDT
Sinnathurai Srivas wrote:
> Unfortunately, on the issue of collation, due to designs of ISCII, Unicode
> has to abandon the sorting based encoding of Tamil in favour of
> transliteration based encoding.
> For example Tamil K will indicate k, h, g, q, x and other related phoneme
> while Devanagari would have individual character shapes representing
> individual phonemes. Tamil is based on Alphabet based phonemic system,
> while Devanagari is based on phonemic system.
I think you mean that Tamil spelling uses digraphs for consonants while
Devanagari uses single letters. Unless the Tamil digraphs are sorted like
single letters, this happens to be irrelevant for Unicode.
> If Unicode changes it's policy from the unimportant and non functioning
> transliteration based encoding to one of natural sorting based encoding
> would be a superior solution. However, expecting Unicode to change it's
> encoding philosophy of ISCII based transliteration encoding to one of
> natural sorting based encoding is not going to be easy.
You may care to view the UCA weights as a temporary conversion to a
sorting-based encoding.
> We will need to work on what is imposed on Tamil and find software
> solutions to resolve sorting requirements.
If Tamil sorting can be expressed purely by a sorting order of consonants
and vowels, then the answer for sorting words is simply to rearrange the
weights on vowels and letters in the default UCA to accord with this
ordering.
> Tamil Grammar, probably the worlds oldest written and a sophisticated
> Grammar, clearly defines authography for Tamil. Here again Unicode does
> not seem to beleive that a language can have Grammar defining it's
> authography. In this regard it is not too late to bring to the attention
> of Unicode
consortium that how authography is defined and how sorting is used.
Does the Tolkappiyam specify the use of Grantha letters? If it doesn't,
then it doesn't specify the orthography (note spelling) of Tamil. However,
orthography is often totally irrelevant for collation, as it is for English
and Thai.
> We will analise the requirements to be able to collate Tamil, by ways of
> software fixes.
Just look at tailoring the UCA.
> To be continued....
I hope with some constructive suggestions.
Richard.
This archive was generated by hypermail 2.1.5 : Sat Jun 25 2005 - 19:49:41 CDT