From: Antoine LECA (Antoine10646@leca-marti.org)
Date: Tue Dec 03 2002 - 16:50:54 EST
Hi folks,
This post is a bit long, so here is a resume:
- regarding the encodings of TMA, they are currently several possibilities,
so it should be possible to sort all "normal" cases with current characters.
- however, this shows that ISCII provides a characetr, INV, with no
counter part in Unicode. Perhaps this is the problem to be solved.
Andy White wrote on 2002-11-29 13:21:14Z:
>
> Marco wrote
>
>> - Does ISCII have a way to distinguish the two cases above
>> and the other possible combinations? I mean:
>> 1. Ta_Ma_Ligature,
>> 2. Khanda_Ta + Ma,
>> 3. Half_Ta + Ma,
>> 4. Ta + Virama + Ma.
>
>
> 1. Ta_Ma_Ligature is simply 'ta virama ma'
> 2. Khanda_Ta + Ma, is 'ta virama virama ma' (equivalent to 'ta virama zwnj ma')
> 3. Half_Ta + Ma is 'ta virama inv ma' (equivalent to 'ta virama zwj ma')
I fail to understand why it cannot (also) be coded as 'ta halant nukta ma'
using the "soft-halant" feature of ISCII, which is supposed to do just that
(see IS13194:1991 6.3.2)
I know iLeap (and ISFA in general) renders it incorrectly, but when I read
6.3.2 ("prevents it from combining with the following consonant"), I believe
that the iLeap software is in error here.
> 4. Ta + Virama + Ma should be 'ta virama virama inv ma' but this is not implemented in the iLeap application I am using!
I got an acceptable result with 'ta inv halant ma'. Of course this is a
complete hack (for example, a romanisation of the result will show the
incorrectness), but for visual purposes ony, it does the job. And since
Ta + visible halant is not supposed to be anything useful for normal writing
(i.e. only useful for school taughing or similar tasks, as I understand
things; at least no Bengali words are supposed to be written this way),
it seems to me
The problem I have, and it is very well synthetised by Andy and Marco here,
is that in ISCII-91 I see *three* mechanisms to vary the rendering
"Explicit halant", coded E8 E8, described in 6.3.1
"Soft Halant", coded E8 E9, described in 6.3.2
"Invisible consonant INV", coded D9, described in 6.4, which further
may combine with the other two, but is intended only for rendering
purpose
At the same time, Unicode (3.0) does only provide *two* mechanisms
inserting ZWNJ after virama, called "Explicit Virama"
inserting ZWJ after virama, called "Explicit Half-Consonant"
There is little doubt that "Explicit Virama" and "Explicit Halant" can be
paired: their descriptions are very similar.
However, I remember reading in Unicode 1.0 (unfortunately, I did have it
at hands) that the position at DA (INV consonant, according to ISCII-88)
was equated to the ZWJ. While it might appear correct for some cases,
I do not believe this is correct. The Indic FAQ also has words on the
topic, but there is many things to comment on this FAQ, so I won't
elaborate further (however, if the editor is reading, please contact me.)
I believe ZWJ could be equated to Soft Halant, as the description are
similar (except the well-known exception of the eyelash-ra, as stated in
Unicode 2.0), despite the important difference in words.
I understand that now Malayalam cillus are to be encoded with ZWJ, too.
As a result, we are left with one code in ISCII-91, INV (D9), which is
indeed quite special (its description makes clear it is not used to write
some sound, it is merely an artefact, useful for specialized tasks), that
ends with no corresponding in Unicode, at least that I may spot at once
(remember, it should be a character that shares the properties of the
"regular" consonants, i.e. ligating before or after virama, or before
vowel signs.) Perhaps, as the discussion above showed, this is really
this character that appears to be missing to perform specialized tasks
with Indic scripts? (such as the Malayalam Half-U that I were speaking about
last month.)
Andy's new proposal, CBM, is a bit different, since it affects precise
rules to solve some cases. The thing that makes me a bit reluctant, is
that there is no previous art with CBM, so we can be wrong a couple of
times, with subsequent rectifications, erratas and change of meaning,
overall bad things. On the other hand, including a new character, with
the same semantics as already present in ISCII, would ease some
conversions (I know it would be few), and also provide a reference to
implement.
Having say that, the first example of Andy, with the relatives priorities
of reph versus jophola (and similar examples between reph and
rakar-vattu/vakar/yakar/lakar) remains to be examined in more details.
Regards,
Antoine
This archive was generated by hypermail 2.1.5 : Tue Dec 03 2002 - 17:27:55 EST