Fwd: Encoding Bengali Vowel forms

From: mijan mijan (meejan@hotmail.com)
Date: Fri Apr 28 2000 - 17:48:30 EDT


Here is Abdul's response to Marco

Marco said that:
Abdul Malik wrote in his report:
>The problem
>Unicode allows conjunct part glyphs such as zophola to be
>formed only by placing the Virama sign ( >) between two
>consonants. When ‘zophola AA_sign’ is placed after Letter_E
>or Letter_AA it is not considered to form conjunct with the
>vowel, it only serves to function as a vowel modifier.
>The zophola-AA sign can not be included in Unicode as a vowel
>modifier sign however, as when placed after a consonant it is
>considered to have different semantics. It would also be
>illegal to place it after a vowel sign.

Marco said:

Assumptions #1 and #3 are totally false

1) "Unicode allows conjunct part glyphs [...] to be formed only by
placing the Virama sign between two consonants."

Abdul says: OK. Cut the word ‘only’ from the above sentence

3) "It would also be illegal to place it [a virama] after a vowel
sign."

Abdul says: I was not referring to ‘[a virama]’ I was referring to the
‘zophola_AA sign’.

The point I was making is that this sequence is considered illegal in
Bengali so it does not make sense to include zophola_AA in the Various signs
section of the Bengali unicode range.

Assumption #2 is irrelevant: the precise grammatical or phonetic function of
characters is not an issue for encoding.

2) "When ‘zophola AA_sign’ is placed after Letter_E or Letter_AA it
is not considered to form conjunct with the vowel, it only serves to
function as a vowel modifier."

Abdul says: Irrelevant? The Unicode charts have been, were possible, laid
out with Vowels, Consonant etc grouped together. Also the function of
characters is an issue for rendering engines

Unicode does not have a "syntax" that
stipulates which sequences of characters are legal and which are not.

Abdul says: We need legal sequences defined in Unicode for Indic rendering,
otherwise how are we to program our rendering mechanisms? A programmer can
not be expected to be an expert on every language.

Conclusion
>‘Vowel A_zophola_AA’ and ‘Vowel E_zophola_AA’ need to be
>included in the Bengali Unicode range as separate vowels.
>[...]

I have no opinions about accepting or not this proposal.

Abdul says: I need your opinions

As I see it, zophola is just the special glyph used to represent the
sequence of these two characters:

09CD (B. SIGN VIRAMA) + 09AF (B. LETTER YA)

The formation of this ligature can and should be totally *unconditional*: I
see no valid reason to bother checking for special conditions.

Abdul says: As I said, rendering machines need to check for special
conditions.

This means that:

- zophola (in *any* position) can be encoded as:
09CD (B. SIGN VIRAMA) + 09AF (B. LETTER YA)

And, consequently:

- zophola_aa can be encoded as:
09CD (B. SIGN VIRAMA) + 09AF (B. LETTER YA) + 09BE (B. VOWEL SIGN AA)

- vowel_a_zophola_aa can be encoded as:
0985 (B. LETTER A) + 09CD (B. SIGN VIRAMA) + 09AF (B. LETTER YA) + 09BE
(B. VOWEL SIGN AA)

- vowel_e_zophola_aa can be encoded as:
098F (B. LETTER E) + 09CD (B. SIGN VIRAMA) + 09AF (B. LETTER YA) + 09BE
(B. VOWEL SIGN AA)

>The problem with [this] is that the string would have
>to be specifically looked for. [...]

Problem? Why a problem? The main job of a rendering engine is to look up the
glyphs that correspond to strings of one or more characters. Why should
*this* particular lookup be a problem?

Abdul says:-

OK OK I don’t want to argue with you but I need official guidance.

You must remember that this sequence: Vowel_A Virama Letter_Ya Vowelsign_AA
is considered a vowel in its own right or at least a single syllable (i.e.
it has to be recognized as such by Indic rendering). So suppose I want to
place a Candrabindu on top of it. Do I do a, Vowel_A Virama Letter_Ya
Vowelsign_AA Candrabindu or a, Vowel_A Candrabindu Virama Letter_Ya
Vowelsign_AA or something else?

You see? There will have to be sequences that are considered illegal when
Indic scripts are concerned. Other wise people will spell one word more than
one way. A good example is Devanagari_vowel I. some people using their
current software have to type it before the consonant rather than after. If
you said that Vowel_I should be rendered Unconditionally we would be in a
real mess with regard to alphabetic sorting.

Best regards
Abdul

________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT