From: Michael Everson (everson@evertype.com)
Date: Wed Mar 05 2003 - 15:19:26 EST
Andy, the ya-phalaa is a presentation form of cojoined YA, which is
produced in Unicode by the sequence VIRAMA + YA. Encoding it as
anything else makes very little sense at all. However it is
pronounced today in Bengali, and however weird you feel about its
being applied to an initial vowel, the fact is that it is still a
presentation form of cojoined YA, and it should be encoded as such.
Consider the fact that the Bhagavadgita is available in Sanskrit in
Bengali script. This will certainly contain many, many examples of
consonant clusters in -YA. These will all be encoded as VIRAMA + YA,
not as some independent form of ya-phalaa.
It is easy to point fingers about a mismatch that someone like me
makes, but the Unicode encoding model for Indic scripts is very
robust, and we do our best to apply it correctly.
Your proposed combining ya-phalaa will do Bengali no service, as it
will introduce multiple spellings for consonant clusters in -YA. I
have already stated on this forum:
"For example, in Sanskrit and Bengali, we have the word pratyeka
'each, every'. This is derived from the Sanskrit root prati
(expressing likeness or comparison) plus eka 'one'. In Sanskrit
orthography i + e becomes ye and is so written. Now in Bengali this
word also exists and in both languages what is written is PA + VIRAMA
+ RA + TA + VIRAMA + YA + E + KA."
It would be absurd -- and wrong -- to spell the Sanskrit word one way
and the Bengali word another, especially as it is the same word.
>IMHO, TUS needs solid rules; Exceptions, hacks, patches, or workarounds
>should definitely be avoided wherever possible. (If you care to look
>back in the mailing list archives a few years, you will see that the
>"a+Virama+Ya+aa" kludge was originally proposed as a workaround due to
>the lack of a separate encoded letter)
It isn't a kludge. It is a consistent application of the rules.
Ya-phalaa is a presentation form of YA in conjunction with a
preceding consonant or -- a Bengali innovation -- an independent
vowel.
In keeping this stance, Andy, I am defending the Unicode Standards
encoding principles. The Indic encoding model is constantly under
attack from people who want explicit rephas, explicit half-forms,
explicit ya-phalaas, and all sorts of other explicit things, which
were we to encode them would make the standard very much worse than
it is.
To reiterate our consistency in using this model, I will give you a
Malayalam example.
NA + VIRAMA + MA --> NMA (a single conjunct)
NA + VIRAMA + ZWNJ + MA --> NMA (with a visible virama breve above and between)
NA + VIRAMA + ZWJ + MA --> NMA (with the cillaks.aram virama curl)
We prefer to apply this consistency to Bengali as well. Thank you for
correcting my error earlier. That kind of feedback is helpful.
Beating us up because you don't like our encoding model isn't.
-- Michael Everson * * Everson Typography * * http://www.evertype.com
This archive was generated by hypermail 2.1.5 : Wed Mar 05 2003 - 16:04:51 EST