From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Fri Mar 24 2006 - 06:08:22 CST
[ Sorry if double posting. First one was from an incorrect address, so I
resend. ]
Kent Karlsson wrote:
> Antoine Leca wrote:
>
>> Example 1, Hindi: should the I matra precedes the whole
>> cluster, or only the last freestanding consonant, in the
>> case of a cluster constituted from two
>> or more visually distinct components?
>
> A spelling difference that should be recorded in the sequence
> of characters (in some, not yet standardised, way), quite apart
> from font issues.
Are you intending to say that one SHOULD (IS REQUIRED TO) register in the
codepoints the use of 2-dot-like Umlaut in a different way from
2-stroke-like Umlaut? Saying it is a "spelling" difference?
Are you intending to say that if I wrote "Mme" (Mrs in French), I should
differentiate, in a not yet standardised way, the fact that I write it with
superscript characters or not? Saying it is a "spelling" difference?
I guess you did not.
So, if the original encoder does NOT make a distinction in meaning between
the two forms, why would Unicode require him to encode this difference at
codepoint level?
I agree it could be defined a way in Unicode to REQUEST for one of the two
forms, when they are viewed as different. Similarly to the case of
requesting formation, or not, of single-glyph ligatures, with the ZWJ/ZWNJ
joiners.
But it should be optional (and supplementary), not mandatory.
>> Example 2, Malayalam: dead RA can come either before the
>> (last part of the) consonant, or below it.
>
> A spelling difference that should be recorded in the sequence
> of characters (in some, not yet standardised, way), quite apart
> from font issues.
Worse here, much worse.
The difference is between two rendering styles, which are known to be BOTH
in current use (disregarding the voiced assertions of the contrary, coming
from both camps.)
And it was a conscious (and reaffirmed) decision of ISO/Unicode to encode
them joinly.
What you are asking here is to BAN one of the two forms of writing Malayalam
to use the straightforward way.
However, it is not yet standardised to decide which form will be banned.
So, each camp is required to voice his points in the loudest way it can.
In the mean time, chaos is reigning; and basement-level Malayalee are unable
to use Unicode.
I find such a state of affair to be bad, really bad.
Again, this NOT to say that one could find a way to specify the use of one
or other style; but it probably has to be done outside of the codepoints
stream, at least if one want to prevent the fiction of encoding joinly...
>> Example 3, Malayalam again: the matra for AU U+0D4C can be
>> shown either as
>> two parts (as depicted in the tables), or only as the right part.
>
> No it cannot. AU spelled with U+0D4C unambigously has two
> (visible) parts. AU with only the right part is unambiguously
> spelled with U+0D57 (quite regardless of the character name).
I am confused here (and this is hardly new).
I agree U+0D57 (as are its siblings xx55, xx56 or xx57 in the other scripts)
do have the same properties etc. as the vowel signs, so this use could be
possible without surgical operations on the UCD. But the current (5.0 draft)
database says... :
0D57 MALAYALAM AU LENGTH MARK
* only a representation of the right half of 0D4C
And I am not sure this should be interpreted as you did.
In fact, I read the word "only" as implying... the complete contrary.
The French translation is not clearer:
0D57 SIGNE DE LONGUEUR MALAYALAM AOU
* simplement la représentation de la moitié droite de 0D4C
Desiging it as the valid form to encode single-part Malayalam AU should IMHO
be clearly spelled out by the UTC; and it relatively easy to do so (amending
the note, for example).
As last time I look at, it was not decided to do so, in fact it was not even
decided to look at
this issue (absent from the list of pending Indic issues). Despite being
bring into the debate every now and then.
I do not know if there is a mechanism for clarification requests (either at
UTC or WG2 level), but it might be useful here, since the informal way does
not seem to be operative.
> This is already very clear, but apparently needs to be pointed out.
It may be clear to you, but (and I beg your pardon for the offence), I would
very much prefer clear statements from the relevant persons.
In the indic forum, which is supposed to sort out this kind of things, none
of the officious spokepersons of the UTC did make clear any of these issues,
much the contrary.
And the relevant logs of the UTC discussions did not make more clear either,
again much the contrary (with the stop-and-go of the cillu issue in the
middle). Presently, all the work about Malayalam in Unicode has been
deferred to an ad-hoc working group (with no-one I know of represented
there.) If all the issues were very clear, then this working group would
have already bring its conclusions, at the very least a draft presenting the
state of affairs; I did not see such a thing.
I am not to say I know better, as I said I am not engaged in this working
group, nor am I qualified to be I presume.
Perhaps you are in this group. In such a case, can I kindly ask you to urge
the group to present some definitive conclusions about the points which are
"clearly" acknowledged (in the way of the document issued by the Kerala IT
Mission, which DOES assert some of the above points, but which I cannot take
for granted to represent the view of the UTC, quite the contrary.)
Antoine
This archive was generated by hypermail 2.1.5 : Fri Mar 24 2006 - 06:10:46 CST