Re: Proposal to add Bengali Khanda Ta

From: Antoine LECA (Antoine10646@leca-marti.org)
Date: Tue Dec 03 2002 - 16:50:54 EST

  • Next message: Peter_Constable@sil.org: "RE: Proposal to add Bengali Khanda Ta"

    Hi folks,

    This post is a bit long, so here is a resume:
    - regarding the encodings of TMA, they are currently several possibilities,
    so it should be possible to sort all "normal" cases with current characters.
    - however, this shows that ISCII provides a characetr, INV, with no
    counter part in Unicode. Perhaps this is the problem to be solved.

    Andy White wrote on 2002-11-29 13:21:14Z:
    >
    > Marco wrote
    >
    >> - Does ISCII have a way to distinguish the two cases above
    >> and the other possible combinations? I mean:
    >> 1. Ta_Ma_Ligature,
    >> 2. Khanda_Ta + Ma,
    >> 3. Half_Ta + Ma,
    >> 4. Ta + Virama + Ma.
    >
    >
    > 1. Ta_Ma_Ligature is simply 'ta virama ma'
    > 2. Khanda_Ta + Ma, is 'ta virama virama ma' (equivalent to 'ta virama zwnj ma')
    > 3. Half_Ta + Ma is 'ta virama inv ma' (equivalent to 'ta virama zwj ma')

    I fail to understand why it cannot (also) be coded as 'ta halant nukta ma'
    using the "soft-halant" feature of ISCII, which is supposed to do just that
    (see IS13194:1991 6.3.2)
    I know iLeap (and ISFA in general) renders it incorrectly, but when I read
    6.3.2 ("prevents it from combining with the following consonant"), I believe
    that the iLeap software is in error here.

    > 4. Ta + Virama + Ma should be 'ta virama virama inv ma' but this is not implemented in the iLeap application I am using!

    I got an acceptable result with 'ta inv halant ma'. Of course this is a
    complete hack (for example, a romanisation of the result will show the
    incorrectness), but for visual purposes ony, it does the job. And since
    Ta + visible halant is not supposed to be anything useful for normal writing
    (i.e. only useful for school taughing or similar tasks, as I understand
    things; at least no Bengali words are supposed to be written this way),
    it seems to me

    The problem I have, and it is very well synthetised by Andy and Marco here,
    is that in ISCII-91 I see *three* mechanisms to vary the rendering
         "Explicit halant", coded E8 E8, described in 6.3.1
         "Soft Halant", coded E8 E9, described in 6.3.2
         "Invisible consonant INV", coded D9, described in 6.4, which further
            may combine with the other two, but is intended only for rendering
            purpose

    At the same time, Unicode (3.0) does only provide *two* mechanisms
         inserting ZWNJ after virama, called "Explicit Virama"
         inserting ZWJ after virama, called "Explicit Half-Consonant"

    There is little doubt that "Explicit Virama" and "Explicit Halant" can be
    paired: their descriptions are very similar.
    However, I remember reading in Unicode 1.0 (unfortunately, I did have it
    at hands) that the position at DA (INV consonant, according to ISCII-88)
    was equated to the ZWJ. While it might appear correct for some cases,
    I do not believe this is correct. The Indic FAQ also has words on the
    topic, but there is many things to comment on this FAQ, so I won't
    elaborate further (however, if the editor is reading, please contact me.)
    I believe ZWJ could be equated to Soft Halant, as the description are
    similar (except the well-known exception of the eyelash-ra, as stated in
    Unicode 2.0), despite the important difference in words.
    I understand that now Malayalam cillus are to be encoded with ZWJ, too.

    As a result, we are left with one code in ISCII-91, INV (D9), which is
    indeed quite special (its description makes clear it is not used to write
    some sound, it is merely an artefact, useful for specialized tasks), that
    ends with no corresponding in Unicode, at least that I may spot at once
    (remember, it should be a character that shares the properties of the
    "regular" consonants, i.e. ligating before or after virama, or before
    vowel signs.) Perhaps, as the discussion above showed, this is really
    this character that appears to be missing to perform specialized tasks
    with Indic scripts? (such as the Malayalam Half-U that I were speaking about
    last month.)

    Andy's new proposal, CBM, is a bit different, since it affects precise
    rules to solve some cases. The thing that makes me a bit reluctant, is
    that there is no previous art with CBM, so we can be wrong a couple of
    times, with subsequent rectifications, erratas and change of meaning,
    overall bad things. On the other hand, including a new character, with
    the same semantics as already present in ISCII, would ease some
    conversions (I know it would be few), and also provide a reference to
    implement.
    Having say that, the first example of Andy, with the relatives priorities
    of reph versus jophola (and similar examples between reph and
    rakar-vattu/vakar/yakar/lakar) remains to be examined in more details.

    Regards,
    Antoine



    This archive was generated by hypermail 2.1.5 : Tue Dec 03 2002 - 17:27:55 EST