RE: Encoding Bengali Vowel forms (again)

From: Apurva Joshi (apurvaj@microsoft.com)
Date: Tue May 02 2000 - 20:29:04 EDT

Next message: John Hudson: "Rendering Serbian italic forms with OpenType"
Previous message: John Jenkins: "Re: Encoding Bengali Vowel forms (again)"
Maybe in reply to: mijan mijan: "Encoding Bengali Vowel forms (again)"
Next in thread: Kenneth Whistler: "RE: Encoding Bengali Vowel forms (again)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Please see my response below.
Thanks,
-apurva

-----Original Message-----
From: Marco.Cimarosti@icl.com [mailto:Marco.Cimarosti@icl.com]
Sent: Tuesday, May 02, 2000 10:17 AM
To: Apurva Joshi
Cc: unicode@unicode.org
Subject: RE: Encoding Bengali Vowel forms (again)

> [apurva:] A Virama, can be input by a user, anywhere in a sequence of
> characters. However, that does not mean it would always
> result in output
> that is meaningful to the user receiving it, or considered
> logical in a [Indic] script.

I agree. But, as a programmer, I tend to attribute a big importance also to
absurd or limit cases. This is because programs run unattended (i.e.: at
runtime, the programmer is not there to watch what happens), so robust
software must be able to make its decisions in all cases, and it *cannot*
resort to common sense or logic (because a program is just a soft machine,
not a logical being).
[apurva:] Very true. All that was intended in my response above, was the
fact that a user could input a number of halants one after the other, for
whatever reason. We would yet need to ensure that the software does not
register this as an 'unidentifiable' or illogical case, and should not
'not_respond/crash'.

> I have known atleast the following occurances of the virama
> as illogical
> when contructing syllables in Indic:
> 1. when it follows another virama [I guess Sinhala permits this]
> 2. when it follows a vowel sign
> 3. when it follows an independent vowel
> 4. when it occurs on its own [ie. when it's not for display purposes]

Number 4 is not illogical: it is just a bit unusual. Any grapheme may have
its "meta-linguistic" value (or should I say "meta-graphic"?). That is when
writing, itself, is the subject of a written text, it may happen that a sign
just stands for itself. E.g. if it wasn't for current technological
limitations (e-mail systems, fonts, etc.), we would use a lot of stand-alone
Bengali halants within *this* discussion, because we are talking about the
Bangla writing system.
[apurva:] Sure, I agree entirely. #4 only implies that the virama on its own
will not constitute a syllable. Hence I would consider it illogical for the
purpose of processing this to shape a syllable. However it can be used by a
user for display purposes as you mention above, as well as the odd case of a
string of halants [in my response above].

Number 3 is actually the subject of this discussion :-) If everybody agreed
that this usage is "illogical", the whole discussion would be pointless.
[apurva:] Yes.

> Given the above purposes of the Virama, if the following
> occurs [not just in Bengali]:
> LetterA + virama + consonantYa + vowelSignAa
> it would imply the removal of LetterA [the full vowel]
> itself. Going by the rules, this would not be logical.

You have your good part of reason. But you should not forget that this is
(or was) a rather *exceptional* sequence, used only in words of foreign
origin.
[apurva:] The use of English words in Bengali is considerably frequent. And
hence I am not sure whether it would be appropriate to classify it as an
exception...

Transliterations always work this way, in any language: the spelling rules
of the "host" language are forced and stretched to accommodate unusual
sounds imported from the "guest" language.

Moreover, if you forget Unicode, ISCII, computers, and just look at printed
text, you will see that the idea behind this spelling is not as illogical as
you say.
[apurva:] Yes. I am aware of all that you mention above. I do not consider
the use of Ya_phola inferior/illogical in anyway. The discussion is for the
purpose of finding out if it would be better to encode
LetterA_YaPhola-VowelsignAa, LetterE_YaPhola-VowelsignAa as 'specific
additions' to the Bengali block. In doing so, we might avoid creating
new/future trouble, if we let each Indic shaping engine shape this character
using its own rule.

Try imagining what the first people using it had in mind:
1) The sequence "zophola" (U+09CD U+09AF) + the "aa" matra (U+09BE) is used
for transcribing the English "a" in "bat". This zophola_aa can be seen as a
special "composite" matra to write a new Bengali sound, imported from
English.
[apurva:] We can treat 'zophola-aa' as a new composite matra, however it
will require us to add a newer 'independent vowel' for this matra. This is
because a matra can exist only if it there is a corresponding 'independent
vowel'. Providing the LetterAYaPholaVowelSignAa and
LetterEYaPholaVowelSignAa in the 'specific addition' will not require any of
this and will also identify it as a 'special case' in Bengali.

2) There is no vowel letter for writing an English "a" at the beginning of a
word. This is a problem.
[apurva:] One could use LetterA in words like 'another', 'atone' etc.

3) There is no real vowel letter even for Bengali "aa" at the beginning of a
word! The character that has been encoded as the "aa" vowel letter (U+0986)
is actually a vowel letter "a" (U+0985) with the "aa" matra (U+09BE).
[apurva:] Indic scripts are syllabic in nature, where smaller atomic units
combine with others to form larger units. More often than not, to generate a
lengthened vowel, would mean
(a). adding an additional vowel sign eg. LetterAa in Devanagari, Bengali
etc.
(b). making a change in the letter itself. eg. LetterU and LetterUu in
Bengali.
In both cases, the vowel letters are 'real', just that they might not have
vastly different forms.

4) The same technique can be used to fix the problem at point 2.

> [Peter Constable] As far as using the PUA is concerned, yes, that's an
option.
> [apurva:] I might not look favourably on the use of PUA for this.

I am with Apurva. We are talking about an encoded script, and a very
important one, used for languages spoken by 100's of millions of people. So
I would leave the PUA out of this.
[apurva:] I am glad that you too see merit in not using the PUA for this.
Thanks.

> [apurva:] Pardon my being blunt here. But, Indic scripts
> [like Malayalam]
> have had to see a change in orthography and typographical
> quality [some,
> sadly for the worse] due to some interim solutions
> [constraints in some
> earlier typesetting systems]. Since these solutions
> unfortunately have not
> been looked at as interim, but as permanent [they have
> existed for decades].
> As a result, a whole generation of young people in India who
> have not had
> the opportunity to see the original orthography of the
> script, think that
> the current incorrectly implemented solutions 'are' the way
> it has to be!
> Hence it would be prudent of us to try our best to look at
> the long term
> effects too, that technology [here an encoding standard]
> tends to usher in
> with itself.

It is no news and no scandal that writing technology can influence the
appearance (and even the functionality!) of writing systems. This always
happened throughout history. This changes introduced by technology can be
good or bad; good to some people and bad to other; bad for this generation
but good for the next, etc.

A lot of examples come to my mind of how technology influenced writing in
the past, so I must limit myself to only a few:

- The Latin alphabet used to be often carved on stone, in ancient Rome.
Using a chisel, it was almost impossible to "close" the stems of letters
maintaining the same width: the end of stems was always a little bit wider.
Thousands years later, in the AAT/OT era, most Latin fonts still have these
"serifs" derived from stone carving technology.

- At a certain age, all the circles and narrow curves in Chinese ideographs
became squared. This change was caused by the adoption of paper, soft-tip
brushes and liquid ink. With such a writing technology, it is a problem to
trace circles, because the centrifugal force tends to shoot drops of ink all
around the circle.

- European alphabets (Latin, Greek, Cyrillic, etc.) used to have a lot of
"ligatures", like the Indic scripts. With the introduction of movable type
printing, all these ligatures had to be anticipated and separately produced
as pre-composed types. This was a waste of time and money, so ligatures
started to be used less and less. Today only a few of them survive, not even
used in all languages (e.g. "&" "ß", "@", etc.)

- The Arabic script always had a characteristic slant: the end of a word is
often much lower than its start. The technological constraint introduced by
printing caused modern Arabic text to be horizontalized.

- The letters in most scripts have variable width: e.g. an "i" is much
narrower than a "w". But the introduction of typewriters caused the
invention of "fixed pitch" fonts, that are still very common today.

All these examples started as error or approximations but, today, are
considered as regular features of these scripts. In many cases, these
features are even considered as part of the fascination of these scripts
(e.g. the Latin serifs or the Chinese squared shapes).

Of course, it is OK to oppose some resistance to these changes, and to ask
that the next technology be able -- at least -- to do what the older one
allowed. But this does not mean than technological changes should always be
rejected without evaluation. Our modern technology will leave its traces on
writing, just like liquid ink or stone chiseling did. Why shouldn't it?

_ Marco

[apurva:] Thanks for all these detailed examples Marco! I fully agree that
scripts evolve [and need to] over the course of time. In many cases above,
'users' have chosen the substrate and tools to write, etch, carve etc. and
have shaped writing with their own decisions. My only submission is, that
newer changes to a script should ideally be asked for, by the users of the
script; and a technology should try its best to fulfill it. It would make me
uncomfortable if it's the other way round...
Also, I don't mean to reject technological changes without evaluation. I am
aware that it would not be right of me, to expect a screenful of
multilingual text to replace the grandeur of a Rosetta stone in my mind.
Newer technology might not have [or retain] the ingredients that allow it to
do all that an earlier technology did. However it might allow for the
creation of newer ways of use, processing information etc. I guess we are
all aiming at improving even more, the strong points that this technology
has. Yes, our technology too will leave its traces, and am happy that it
will! In fact I see this as 'atleast' one reason, to make us more aware and
alert of the consequences our decisions have.
Thanks,
-apurva

Next message: John Hudson: "Rendering Serbian italic forms with OpenType"
Previous message: John Jenkins: "Re: Encoding Bengali Vowel forms (again)"
Maybe in reply to: mijan mijan: "Encoding Bengali Vowel forms (again)"
Next in thread: Kenneth Whistler: "RE: Encoding Bengali Vowel forms (again)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT