Re: Encoding Bengali Vowel Forms Again

From: Abdul Malik (meejan@hotmail.com)
Date: Mon May 15 2000 - 16:56:30 EDT


Dear all,

Thank you all for your responses to my encoding worries
I am only sorry that there has been such a delay in responding to them all
by me.
The main reason for the delay is the problem I have in explaining my
thoughts.
Please be patient in trying to understand my points mentioned below as I may
have not made myself totally clear.

First let me say my thoughts on, Unconditional rendering of Virama Ya as the
Zophola glyph

Marco Cimarosti said:
>The formation of this ligature can and should be totally *unconditional*:

and Kenneth Whistler said
>Or alternatively,
>as Marco pointed out, a doublet check on <halant ya-> combinations could
>be implemented to unconditionally use the zophola form

also
Kenneth Whistler said:

>Finally, regarding the question that was raised about how to represent
>a candrabindu or other combining mark for the sequence, I think the
>answer is fairly simple:
>
> Vowel_A_zophola_AA + candrabindu = 0985 09CD 09AF 09BE 0981
> ( a- halant ya -aa candrabindu )

Well the answer may be simple to some, but let me first explain that
the appearance of A_Zophola_AA_candrabindu
(i.e. printed, or on screen after being rendered) is that of
LetterA with candrabindu on top, followed by the zopholaAA glyph.
So it would be natural for a someone without a knowledge of Unicode
to try to form A_Zophola_AA_candrabindu by using the sequence
LetterA candrabindu virama Ya VowelSignAA.

Okay lets see what happens if someone tries to enter this sequence.

LetterA Candrabindu - This should be rendered correctly as this is a valid
syllable
i.e. rendered as: LetterA_Candrabindu
now we follow this with Virama Ya VSignAA.
If you are using unconditional rendering of Virama Ya as Zophola,
this sequence will be rendered as expected by the user.
i.e. rendered the same as Vowel_A_zophola_AA + candrabindu
The user will have no way of knowing that he had made a typo!
It would have been much better if this sequence was rendered as:
LetterA_Candrabindu, Virama, Ya_VSignAA.
This is why I rule out unconditional rendering of Virama Ya as Zophola.

Next we have the sugestion of:
LetterA Virama Ya VSignAA is to be rendered as A_Zophola_AA

I have Three concerns with this,

The main concern is with sorting

To sort Indic text (that I am knowledgeable of) there are some basic rules.
Two of them are:
X + Virama < X
X + VowelSign > X
(where X is a Letter form)

Now with a sequence such as: Ka Virama Ya VSignAA, This works fine

Lets try it with LetterA Virama Ya VSignAA,

LetterA Virama: maybe comes before LetterA?
Ya VSignAA: Yes, this would normally come after Ya

But no matter haw you analyse this sequence you will never come
to the correct sort, which is in fact:
LetterA < LetterA_Zophola_AA < LetterAA.

Now lets deal with rendering it.
Lets take the first part
LetterA Virama Ya.

The question is, should this be rendered as A_Zophola?

if it is, it would indicate to the user that this is a correct syllable or
at least a valid sequence. It is not a valid syllable in Bengali. In all
other sequences that I know of, if you have created a valid syllable then
the parts combine correctly on screen, so that you can tell that you have
entered the sequence correctly. So if it is to be rendered as A_Zophola it
will have to be treated as an exception. This is because it will have to be
treated as an invalid sequence by all Indic processes apart from
rendering.(I am thinking of sorting and editing here) Also if it is rendered
correctly it would imply that one could now enter a candrabindu etc, which
is obviously not correct.

If it is not rendered as A_Zophola and is rendered as A_Virama_Ya, It would
indicate to the user that he had made a typo.

Editing:
Well As far as I know it is proposed that your cursor will jump from
syllable to syllable whilst editing unicode encoded Indic text. This is
because of the problems that could occur if a user tried to delete, for
example, the left part of a split vowelsign (there must be other problems).
In any case, I hope you all agree that syllables have to be recognised
correctly for Indic processes to function. The question I have is: What
should happen if one tries to delete A_Zophola_AA? Should it disappear
completely or should just the AA part go? If the AA bit goes, what form
would the A_Zophola have on screen? (see problems mentioned earlier). If it
goes completely that would be unusual behaviour e.g. One would expect
Ka_Zophola_AA followed by the delete key to be displayed as Ka_Zophola (a
valid syllable).

I think that everyone has agreed that encoding in the private use area is
not a good idea so I won't discuss this.
So I still think that including A_Zophola_AA in the Bengali block is a good
idea, but until such time that more than one other person agrees with this,
I shall not discuss the location within the block.

There is one other solution. It is the solution that ISCII implementers use.
That is to put A_Zophola_AA at position U+0991 i.e. Devanagari CandraO
You would then have to call A_Zophola_AA a "glyph variant of Dev CandraO
used in Bengali". I do not think that this would be looked at kindly, by
some, but it is available for use now.

Thank you for your patience in reading this

Please, any comments are gratefully received

Best regards,

Abdul

________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT