FAQ proposal (was RE: Combining letters in Devanagiri)

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Fri Feb 22 2002 - 10:21:01 EST


Varada wrote:
> I am developing an uni code editor for Devanagiri and have a
> clarification on combine letters in devanagiri.
>
> For Eg if have to form a word that like "PATNI" It should
> have first
> half of "PA" + "TA" + "NA" + "I" .
>
> So also if I have to form a word "HAMSA" it should have full "HA" +
> half "MA" + full "SA".
>
> I downloaded the Unicode 3.2 beta and could not find codes for half
> letters. Would like to know how are these supported in Unicode ?

As this question has been raised and answered many times, and not everybody
has a copy of TUS or can read PDF files, I propose to paraphrase Varada's
question into a specific FAQ, to be added on
<http://www.unicode.org/unicode/faq/indic.html>, possibly as the first
question.

«
Q: I cannot find on Unicode charts the "half forms" of Devanagari letters
(or any other Indic script). These characters are needed to form words such
as "patni".

A: Unicode does not encode half or subjoined letters for the scripts of
India. Like in the ISCII standard, Unicode forms all "consonant clusters"
(such as the "tn" in "patni") by inserting the character "virama" (or
"halant") between the two relevant consonant letters.

For instance, the Devanagari syllable "tna" ("त्न") is encoded with the
following code points:

        U+0924 (त DEVANAGARI LETTER TA)
        U+094D (् DEVANAGARI SIGN VIRAMA = halant)
        U+0928 (न DEVANAGARI LETTER NA)

These three characters will be normally displayed using the single glyph
<tna ligature> ("त्न"). But it is also possible that they are displayed
using a <half ta> glyph followed by a <full na> glyph ("त्‍न"), or even with
a <full ta> glyph combined with a <virama> glyph and followed by a <full na>
glyph ("त्‌न")

Which form will be actually displayed is the decision of an underlying
software module called "display engine", which bases this decision on the
availability of glyphs in the font.

If the sequence U+0924, U+094D is not followed by another consonant letter
(such as "na") it is always displayed as a <full ta> glyph combined with the
<virama> glyph ("त्").

Unicode provides a way to force the display engine to show a half letter
form. To do this, an invisible character called ZERO WIDTH JOINER should be
inserted after the virama:

        U+0924 (त DEVANAGARI LETTER TA)
        U+094D (् DEVANAGARI SIGN VIRAMA = halant)
        U+200D (zwj ZERO WIDTH JOINER)
        U+0928 (न DEVANAGARI LETTER NA)

This sequence is always displayed as a <half ta> glyph followed by a <full
na> glyph ("त्‍न"). Even if the consonant "na" is not present, the sequence
U+0924, U+094D, U+200D is displayed as a <half ta> glyph ("त्‍").

Unicode also provides a way to force the display engine to show the <virama>
glyph. To do this, an invisible character called ZERO WIDTH NON-JOINER
should be inserted after the virama:

        U+0924 (त DEVANAGARI LETTER TA)
        U+094D (् DEVANAGARI SIGN VIRAMA = halant)
        U+200C (zwnj ZERO WIDTH NON-JOINER)
        U+0928 (न DEVANAGARI LETTER NA)

This sequence is always displayed as a <full ta> glyph combined with a
<virama> glyph and followed by a <full na> glyph ("त्‌न").

For more detailed information, see Chapter 9 of the Unicode Standard, "South
and Southeast Asian Scripts"
<http://www.unicode.org/unicode/uni2book/ch09.pdf>.
»

I don't know if all the glyphs in this e-mail will show correctly to
everybody. However, I can provide GIF images for all the examples.

_ Marco



This archive was generated by hypermail 2.1.2 : Fri Feb 22 2002 - 09:47:39 EST