Marco>> As usual, I cannot stop spitting my little word :-|
Antoine>I believe I am as bad as you are. :-|.
OK, I'll go along. :-|
I'm very much inclined to agree with Marco that nothing *new*
is needed, and also with Antoine that interested parties should
discuss alternatives and agree on what will be done.
Marco said:
>In general, viramas are just characters as any other,
>and can occur *everywhere*. And this a general
>feature of Unicode: with few reasonable exceptions
>(e.g. unpaired surrogates), Unicode does not have a
>"syntax" that stipulates which sequences of
>characters are legal and which are not.
Marco's general comment about Unicode not having a syntax
(apart from things like surrogates) is, in my understanding,
mostly but not 100% true. For example, the standard does
indicate that Devanagari dependent vowels are to be encoded
after their consonant (in logical order) while Thai vowels are
encoded in visual order (which sometimes means before the
consonant). It's necessary to mandate some things of this sort
so that the standard will get implemented in software, and
implemented in a consistent manner such that data interchange
is possible (and that's the purpose for a character encoding
standard). It would be a big problem for data interchange if
Devanagari dependent vowels were sometimes encoded before and
sometimes after the consonant at the whim of individual
implementers.
In my mind, more of this is actually needed. Several months
ago, we were working on our Yi font, and the samples that our
clients showed us had occasional use of a middle dot as
punctuation. Now, how many choices might there be for encoding
this? I never made a thorough count, but it's more than one. I
inquired on this list and with UTC to see if anyone could tell
me what this punctuation character is and how it should be
encoded, and nobody gave a definitive answer, probably because
nobody had considered it before. We ended up using 30fb
KATAKANA MIDDLE DOT since this would have the
fullwidth/monowidth properties needed for Yi. But what if
another implementer chose to use one of the other characters
with a similar visual appearance? The result would be a
hindrance to successful interchange.
But I'm rambling. My point is that it is important for this
issue to be discussed and that implementers agree on a
solution. But, what Marco said about nothing prohibiting
combining virama in new ways is absolutely true, as far as I
know.
Now, Apurva wrote:
>The semantics of Ya in conjunct formation and for
>use with LetterA /LetterE is very different.
Semantics are different in what sense? Do you mean that they
would represent different things phonologically/linguistically,
or that different Unicode semantics would be required? If it's
just a matter of different linguistic significance, that is a
non-issue. The letter "g" has different phonological meaning
between "rag" and in "rough"; "e" has different phonological
meaning between "feet" and "fate". But that doesn't mean
different encodings are needed for these.
There is nothing about the Unicode semantics of Bengali
characters that prohibit using what is already there. All
that's needed is to abandon certain assumptions, which Marco
has already discussed. (I'll forward that message to the
OpenType list for the benefit of people on that list who aren't
on Unicode.) If you want to propose adding new characters to
Unicode, you need to have good reasons why an implementation
using the existing characters is inadequate *in terms of text
processing issues* (not in terms of how speakers/writers think
of the orthography - that is essentially irrelevant).
As far as using the PUA is concerned, yes, that's an option.
It's becomes problematic, however, if you want all implementers
to agree on particular PUA characters. Let's say everybody
interested in Bengali gets together and agrees that E000 and
E001 will be used for Vowel A_zophola_AA and Vowel
E_zophola_AA, and let's suppose further that Apurva and co
implement Uniscribe and some OT fonts based on this. In the
mean time, somebody else has (as they are free to do) defined
for their use E000 and E001 for a couple of Ethiopic characters
that are being considered for future addition to Unicode.
(That's a real situation - we're currently doing some work on
Ethiopic, and we have made a number of such PUA assignments.)
Now, that person has an Ethiopic font, and they want to display
some text using MS software. They'll be pretty upset if
Uniscribe munges their PUA characters. It's a legal use of
Unicode for MS to define PUA characters for particular uses
(though they are encouraged to do so near the top of the PUA
range, and they really ought to publically document what they
do so that users will know what to expect of their software).
But if they want to be concerned about what end users may want
to do with their software, they need to think very carefully
about any PUA assignments they make. As far as encouraging a
widespread pseudo-standard use of the PUA, that is potentially
counter to the intension of Unicode, particularly if you are
trying to get a number major software developers to go along.
I have no problem with a couple of PUA characters being used by
a group of people interested in Bengali as an interim solution
for the potential characters. Getting some particualr support
for that in Uniscribe would be, I think, not a good thing, and
I'd be very surprised if MS would entertain that possibility.
(But then, if you use the PUA, you don't need any smart font
behaviour for these characters.)
But I'd argue with Marco in favour of your other proposed
interim solution, and I'd argue that it shouldn't be just an
interim solution but rather the permanent solution.
Peter Constable
From: <Antoine.Leca@renault.fr> AT Internet on 04/27/2000 07:07
AM
To: Peter Constable/IntlAdmin/WCT, <unicode@unicode.org> AT
Internet@Ccmail
cc: <unicode@unicode.org> AT Internet@Ccmail, <zzak@csi.com>
AT Internet@Ccmail
Subject: Re: Encoding Bengali Vowel forms (again)
Marco.Cimarosti@icl.com wrote:
>
> As usual, I cannot stop spitting my little word :-|
I believe I am as bad as you are. :-|.
> Abdul Malik wrote in his report:
> > Conclusion
> > ?Vowel A_zophola_AA? and ?Vowel E_zophola_AA? need to be
> > included in the Bengali Unicode range as separate vowels.
> > [...]
>
> I have no opinions about accepting or not this proposal.
Neither do I. However, on OpenType, Apurva Joshi (who I believe
is also on this list) did comment that this would be a much
better
solution that the existing state of affairs (i.e. using
viram-ya
after a vowel).
> What I think, however, is that it is wrong to say that such a
change is > *needed* for encoding Bengali.
As you, I do not believe this is *needed*. BUT, I believe the
issue
should be sorted out, in order to provide correct rendering
tools
for Bengali, adapted to the choosen solution. Whether the
solution
for A-ya will be [\u0985\u09CD\u09AF\u09BE or \u0991 as
proposed by Abdul or \u098D as used (indirectly) in CDAC
products, or \u09Fx as suggested by Apurva], the current
products need to be adjusted anyway.
Antoine
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT