Somnath Kundu <skundu at cal dot vsnl dot net dot in> wrote:
> It appears that Bengali consonant "khanda ta" is not included in
> Unicode Standard 3.0 for Bengali script. "Khanda ta" is the halant
> form of "ta" (09A4) but it is considered a distinct consonant in
> Bengali script. It comes between 09DF and 0982, looks like the
> character 096F and is pronounced as 't', i.e., without the inherent
> vowel 'a' in 'ta'. There are many common words in Bengali that uses
> this consonant.
> Can someone shed some light on why it was not included in the Unicode
> Standard and how Unicode Consortium intend to support it? I am asking
> this question because I see that there is problem supporting it as a
> combination of 09A4+09CD because it is used to create half form of
> 'ta'. It is also not in the list of proposed characters.
> Keenly awaiting for any reply,
Indic scripts aren't exactly my strong point, but in the interest of
providing *any* reply...
"Khanda-ta" appears to be "ta" with the inherent "a" killed. That would
normally point to the use of "ta" (U+09A4) followed by the Bengali
virama (U+09CD). If this sequence results in a "half ta" glyph which is
different from khanda-ta, then the sequence ta + virama + ZWJ (U+200D)
should be used instead.
Many "characters" or character forms in Unicode, especially in Indic and
other complex scripts, are implemented as sequences involving combining
marks such as the virama, ZWJ, and ZWNJ. Also note that the concept
"comes between" has to do with collation, which is language-dependent
and not related to Unicode code point order.
Now I have a question for the true Unicode/Indic experts:
This mailing list gets a LOT of questions asking why Indic
half-consonants and other forms (such as khanda-ta) aren't separately
encoded in Unicode. The Unicode model for Indic scripts is supposedly
based on ISCII-1988. How were these problems handled in ISCII? Do
users of ISCII have the same problems? Are there significant
differences between the ISCII and Unicode approach to these issues, and
if so, should Unicode spell out more explicitly what those differences
are? (The FAQ talks rather generally about "in some cases" and "in
other cases.") Or are these questions being asked by people who have
previously used ASCII-hacked font solutions instead of ISCII?
This archive was generated by hypermail 2.1.2 : Sat May 18 2002 - 19:49:52 EDT