Errors in the Indic FAQ

From: Andy White (
Date: Sun Nov 17 2002 - 09:38:53 EST

  • Next message: "Re: Errors in the Indic FAQ"

    A graphical version of this message available here:

    It is proposed by the Indic Unicode FAQ that Bengali Khanda_Ta should be encoded as Ta Virama ZWJ ... and that an explicit Ta_Virama can be encoded as Ta Virama ZWNJ. This information is wrong and must be changed.

    First some background facts for the unacquainted.
    Khanda Ta is equivalent to Ta Virama i.e. it is a halant form of Ta.
    Khanda Ta is respected as a separate letter to Ta by Bengalis.

    It is incorrect and nonsensical to place a vowel sign immediately next to a Virama
    e.g. the sequence Ta Virama VowelSign.i is wrong. (This sequence implies the rendering, VowelSign.i Ta Virama (VowelSign.i is reordered). This is illogical).
    Therefore, it follows that it is also nonsensical to place a vowel sign immediately after a Khanda Ta (Khanda Ta is equivalent to Ta + Virama.)

    In the Hindi script, you may write the sequence Ka Virama Ta VowelSign.i, and it may be rendered as VowelSign.i followed by a fully legated conjunct. However if you do not want this fully legated form you may use the sequence Ka Virama ZWJ Ta VowelSign.i and have it rendered as VowelSign.i Half_Ka Ta

    Now turning to the Bengali example of Ta Virama Ta VowelSign.i

    Ta Virama Ta VowelSign.i may be rendered as: VowelSign.i Ta_Ta.fullylegated:
    And going by the FAQ:
    Ta Virama ZWJ Ta VowelSign.i. would be rendered as VowelSign.i._KhandaTa Ta
    But this is clearly wrong, as Kanda Ta has now taken on a vowel sign, which is illegal.

    What was needed here was a ZWNJ to separate the Ta Virama from the proceeding Ta.
    But according to the FAQ Ta Virama ZWNJ Ta is to be rendered as: Ta_Virama.explicit, Ta (Ta with a visible Virama, Ta).
    Which seems to imply that Ta Virama ZWNJ VowelSign.i would be rendered as: Ta_Virama.explicit,VowelSign.i Ta:

    I hope that it is clear from this example that the behaviour of Ta Virama in conjunction with ZWJ & ZWNJ needs to be changed.

    Further more, ZWJ should be used to form half consonants in Indic scripts, but it can be seen that Khanda_Ta is not a half form as it is regularly used as the last letter of a word (half forms never are).
    The behaviour should be as follows:

    Ta Virama ZWNJ ... should lead to KandaTa (i.e the halant form of Ta)

    e.g. The Bengali word kutsit shall be encoded as:
       Ka VowelSign.u Ta Virama ZWNJ Sa VowelSign.i Ta
    and rendered as:
       Ka VowelSign.u Ta VowelSign.i Sa Ta.
       (ZWNJ marks the separating point hence preventing the VowelSign.i. connecting to Ta)
    Ta Virama ZWJ ... should lead to a half form of Ta which I suggest should be Ta with a visible Virama (there is no half form of Ta in Bengali)

    To conclude, I recommend that in general:

    Ta Virama ... -> KhandaTa (i.e. Halant form of Ta) if a following letter does not naturally legate with it,
    else, Ta Virama ... -> conjunct form
    Ta Virama ZWNJ -> KandaTa (i.e. explicit Halant form)
    Ta Virama ZWJ -> Ta Virama (as Bengali dose not have a half form of this character).

    This archive was generated by hypermail 2.1.5 : Sun Nov 17 2002 - 10:24:24 EST