From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Nov 10 2003 - 20:39:12 EST
Peter Jacobi noted:
> but it would still hold, that:
> U+0B95 U+0BC6 U+0BB3 and
> U+0B95 U+0BCC
> are indistinguishable in written Tamil.
This is a true ambiguity in the writing system.
<U+0B95, U+0BC6, U+0BB3> ==> ke-l.a
<U+0B95, U+0BCC> ==> kau
Every analysis of Tamil that I see distinguishes the two
letters, l.a versus -au, even though there is an overlap
in glyph form, so it is clear that encoding them distinctly
makes sense, even though they participate in the visual
ambiguity cited above.
However, there is another graphological reason for the
distinction. The -au character is a dependent vowel.
You can't add other vowels such as -ii (U+0BC0) or -u (U+0BC1)
to the rightmost glyph part of -au (the one that *looks*
like the l.a consonant). But you *can* add -ii or -u to
U+0BB3 l.a, so there is a clear difference in distribution
and interaction with other characters.
Finally, for ISCII interoperability, there was no choice
but separate encoding (not of the length mark per se, of
course, but of -au versus l.a).
--Ken
This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 21:20:06 EST