RE: What constitutes "character"?

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Fri Nov 09 2001 - 07:32:10 EST

Previous message: Dhrubajyoti Banerjee: "Re: What constitutes "character"?"
Maybe in reply to: Philipp Reichmuth: "What constitutes "character"?"
Next in thread: Arjun Aggarwal: "Re: What constitutes "character"? New Problem"
Reply: Arjun Aggarwal: "Re: What constitutes "character"? New Problem"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Dhrubajyoti Banerjee wrote:
> On Thu, 08 Nov 2001 Gaspar Sinai wrote :
> >I think that the Indian sctipts deserve better character
> >assignement -
[...]
> However the idea you present, of pushing half characters,
> does not sound correct.

Actually, Gaspar's idea of encoding half letters (and forming full letters
by adding a danda) is *not* intrinsically worse than ISCII/Unicode idea of
encoding full letters (and forming half letters by adding a halant).

But the fact is that it is not even better: it is just *equivalent*!

So, of course, there is no reason to throw away 10 years work just because
one likes it better the other way round...

It would be as saying that India (or England, or Japan) drive on the "wrong"
side of the road, and they should switch to right-hand driving... Why? It
would cost billions to change all the roads and cars, but it wouldn't make
circulation one bit better than it is now -- it would just not make it
worse.

> 'Rakt' consists of the characters
> Consonant ra + Consonant Ka + Halant + Consonant ta.
> This can be written in both the ways shown in the Jpeg file I
> have attached.

Right. *Normally*, an author has no reason to prefer one or the other form.

However, there may be some special cases when (s)he needs to force the
half-consonant form, and I would like to repeat once again that both Unicode
and ISCII are flexible enough to also fit these special needs.

In Unicode, the display of the half-consonant form can be forced by using
the ZWJ control; ISCII achieves the same thing using the INV control:

        Unicode: ka + halant + ZWJ + ta
        ISCII: ka + halant + INV + ta
        result: half ka glyph + full ta glyph

The same syntaxes can also be used to show the half consonant in isolation
(which could be needed on grammar books, etc.):

        Unicode: ka + halant + ZWJ
        ISCII: ka + halant + INV
        result: half ka glyph

If needed, the author can even force the rendering of a visible halant,
using the ZWNJ control in Unicode, or doubling the halant in ISCII:

        Unicode: ka + halant + ZWNJ + ta
        ISCII: ka + halant + halant + ta
        result: full ka glyph + halant glyph + full ta glyph

However, there is a thing that ISCII can do and current Unicode cannot,
which is displaying a "repha" in isolation.

In ISCII, a repha in isolation is encoded like this:

ISCII: ra + halant + INV
result: repha ra glyph

But the corresponding Unicode sequence, yields a different result:

Unicode: ra + halant + ZWJ
result: eyelash ra glyph

Incidentally, the same visual result is also obtained with a different
Unicode sequence:

Unicode: ra + nukta + halant + ZWJ
result: eyelash ra glyph

Of course, this is not a high priority issue, as a repha is normally only
needed within a word. Nevertheless, I think that it is time to devise a
solution for this little problem, because occasionally people could need to
show that symbol when dealing with grammar, typography, etc.

There are several possible solutions; here are a few examples:

1: ra + halant + ZWJ

2: ra + halant + ZWJ + ZWJ

3: ra + halant + LRM

Solution 1 is the cleanest one, of course: a repha glyph is what this
sequence should have been on the first place, and the eyelash ra glyph is
still available with ra + nukta. However, this may be incompatible with
existing applications and fonts.

Solution 2 is not very elegant: as far as I know there are no other usages
for a double Zero Width Joiner. However, it has the advantage that it has a
smaller impact on existing software.

Solution 3 sounds totally absurd to me. However, it is the solution which
is somehow implicitly suggested by the mapping of Apple Devanagari to
Unicode (see "2. Mapping the invisible consonant" in
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/DEVANAGA.TXT).

By the way, the problem of stand-alone repha is also mentioned in the new
Indic FAQ (see "Q. Unicode doesn't have an "invisible letter" (INV) like
ISCII. How can I form the combinations that use INV in ISCII?" in
http://www.unicode.org/faq/indic.html).

However, I find that the problem is not dealt with properly in the answer:

<<
[...]

        ISCII Unicode
        KA halant INV KA virama ZWJ
        RA halant INV RA[sup] (i.e., repha)

        The repha fits this pattern as well. Various formations involving
        superscript and subscript RA are illustrated in the "Rendering"
        subsection of Section 9.1 Devanagari of the Unicode Standard.
        Presumably isolated RA[sup] and RA[sub] forms would be similar to
        the ISCII forms:

ISCII Unicode
INV halant RA SPACE virama RA (RA[sub])

[...]
>>

This doesn't make much sense to me. The first line shows how a half ka
glyph may be encoded in ISCII and Unicode. The second line shows how a
repha glyph is encoded in ISCII, but the Unicode column just reads
"RA[sup]". But "RA[sup]" is not a sequence of code points: it is just the
synonym for "repha glyph" used in the Unicode book. So, in this context,
"RA[sup]" means nothing but tautology.

Also the statement that "repha fits this pattern as well" doesn't make much
sense: if it fits the pattern, why is it not "RA virama ZWJ"? So, the
assumption implied by the adverb "presumably" must be partially rejected.

BTW, the solution suggested for RA[sub] is very nice and well thought. I
wonder, however, is it endorsed by the standard? There was no such thing in
TUS 3.0.

_ Marco

Previous message: Dhrubajyoti Banerjee: "Re: What constitutes "character"?"
Maybe in reply to: Philipp Reichmuth: "What constitutes "character"?"
Next in thread: Arjun Aggarwal: "Re: What constitutes "character"? New Problem"
Reply: Arjun Aggarwal: "Re: What constitutes "character"? New Problem"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Nov 09 2001 - 08:45:43 EST