RE: Tamil 0BB3 and 0BD7

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Nov 10 2003 - 20:39:12 EST

Next message: jameskass@att.net: "Re: Ciphers (Was: Berber/Tifinagh)"

Previous message: Francois Yergeau: "TR: STD 63, RFC 3629 on UTF-8, a transformation format of ISO 106 46"
Maybe in reply to: Peter Jacobi: "Tamil 0BB3 and 0BD7"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter Jacobi noted:

> but it would still hold, that:
> U+0B95 U+0BC6 U+0BB3 and
> U+0B95 U+0BCC
> are indistinguishable in written Tamil.

This is a true ambiguity in the writing system.

<U+0B95, U+0BC6, U+0BB3> ==> ke-l.a

<U+0B95, U+0BCC> ==> kau

Every analysis of Tamil that I see distinguishes the two
letters, l.a versus -au, even though there is an overlap
in glyph form, so it is clear that encoding them distinctly
makes sense, even though they participate in the visual
ambiguity cited above.

However, there is another graphological reason for the
distinction. The -au character is a dependent vowel.
You can't add other vowels such as -ii (U+0BC0) or -u (U+0BC1)
to the rightmost glyph part of -au (the one that *looks*
like the l.a consonant). But you *can* add -ii or -u to
U+0BB3 l.a, so there is a clear difference in distribution
and interaction with other characters.

Finally, for ISCII interoperability, there was no choice
but separate encoding (not of the length mark per se, of
course, but of -au versus l.a).

--Ken

Next message: jameskass@att.net: "Re: Ciphers (Was: Berber/Tifinagh)"
Previous message: Francois Yergeau: "TR: STD 63, RFC 3629 on UTF-8, a transformation format of ISO 106 46"
Maybe in reply to: Peter Jacobi: "Tamil 0BB3 and 0BD7"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 21:20:06 EST