From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sat Mar 25 2006 - 18:15:15 CST
Michael Everson wrote:
>>Is this a Unicore escape?
>
> It was a typo on Ken's part. Discussion of this proposal with members of
> the UTC is fairly intense.
I'm not surprised. There are issues of principle involved and the proposal
sets dangerous precedents. The debate could be worse than the row over
Phoenician. I was a bit slow to understand the 'double hockeysticks' in
this threads original title.
I've posted this to both lists because there are matters of principle,
matters of fact, and matters of Myanmar script practicalities. I think any
replies on matter of principle belong on the general list, while replies
about the other two belong on the SEAsia list. (Is it being moderated at
the weekend? I don't think general list moderation status automatically
transfers to the SE Asia list.)
> I don't personally have the time or energy to get into too much of it on
> this list.
I'll list the major issues *I* can see. There may be others.
Issue 1: Point of Principle/Pride
The Burmese want their glyph-based input, and it's proposed that they get
it, subject to the following restrictions:
(a) 'Logical' order. (I'm pretty sure it isn't actually phonetic order!)
(b) Smart fonts sort out positional variation and combination of subscript
forms (and this concession is actually a clear gain)
(c) Pali-only / archaic subscript and superscript consonants are written as
subjoining character plus normal character and similar methods.
Issue 2: Disunifying AA - Point of Principle
> And then there are differences between Burmese and Mon, where Mon
> regularly uses TALL AA with PHA, though modern Burmese doesn't. (An 1840
> Burmese Bible does, however.)
> It is proposed to disunify TALL AA from AA because (1) adding a
> non-variable TALL AA for Karen use would introduce ambuguous encounters
And be pointless. S'gaw Karen just uses different glyph variants to
Burmese, and the tall AA and short AA glyphs happen to be the same.
I can see two types of argument that would justify disunifying the variants.
Single language argument: The rules for choosing between tall AA and short
AA cannot reasonably be implemented in a rendering system. Cf. the two
forms of Latin small 's' and the two forms of Greek small sigma, for which
I think minimal pairs actually exist. The single language need not be
Burmese; Mon would also do.
Mixed-language argument: There are (or will be) many documents displaying
two or more of Burmese, Mon and S'gaw Karen together, with the text in each
language using that language's preferred systems of selecting between the AA
forms. (This is akin to demonstrating that there are three separate
scripts, and then saying that corresponding characters should mostly be
unified.)
The easiest examples, if they existed, would be lists of people's names from
both languages that used the 'aa' form appropriate to the person's language.
This would be both a single language argument and a mixed-language script.
The proposal presents evidence for neither argument.
> In scripts like Lanna and Myanmar, where it is really *not* possible to
> contextually select the display, the only sensible thing is to encode both
> AA and TALL AA and let users use the one they want when they want it.
If you can justify that statement for the Myanmar script, then you have
established the case for separate encoding of AA and TALL AA. That still
leaves open the option of doing it by variation selectors, but they can be
rendered pointless by the Burmese always using a variation selector.
(Pressing an AA key can generate two characters - the generic AA and the
appropriate variation selector.) Does anyone care to expound the theory of
variation selectors? There may be words in white in the TUS saying 'only
for unifying CJK variants that the Chinese (or Japanese, especially with
surnames) insist are different.'
At present there is the significant possibility of ISO/IEC opposition to
this disunification.
Issue 3: Abolition of Unicode Virama: Floodgate, Myanmar Stability, and
Stability Pact
The creation of ASAT in place of the 2-character visible virama and the
restriction of virama to a subjoining role immediately invalidates most
Unicoded Myanmar script text, including my paltry creations. In principle
that's a BAD THING. For myself, I welcome it and look forward to the
upgrade of SIL's Padauk font.
However, I can see a clamour for other scripts to convert the Unicode virama
to a historical footnote and separate the concepts of conjoining and visible
virama. Unfortunately, this will cause two canonically inequivalent ways of
doing the same thing. They have to be canonically inequivalent, because
virama + ZWNJ has to continue to be in Normal Form C. Such requests are
likely to be refused as a point of principle.
Issue 4: Great SA: Mystification, precedent
I am truly baffled as to why this conjunct needs its own encoding. It is
currently encoded as SA, virama, SA, which will now come to represent the
transparent, unligated form. However, no evidence was provided that this
form actually occurs! If it did, I would suggest that SA, ZWNJ, VIRAMA, SA
be the appropriate representation. (Remember that Myanmar VIRAMA would no
longer be a Unicode virama!)
This sets a dangerous precedent of allowing a separate character for every
non-transparent conjunct. Codepoint for Devanagari KSHA? JNYA? Moreover,
by the stability pact, these will be inequivalent to the current sequences.
Issue 5: Medial WA: Practical Issue
This is partly a question of phonology. Medial WA and WA as part of a 'true
conjunct' will be encoded differently. How does one tell them apart when
entering them? Much of the time, how will one tell them apart visibly?
And, finally, are the people of Burma consistent in deciding whether a 'WA'
in the middle of a word is medial or the second part of a conjunct?
Fortunately, 'true conjunct' WA is fairly rare, but it occurs in the sort of
word liable to be learnt from a book rather from speech.
Quibbles:
The new way of encoding kinzi seems unnaturally complicated, and quite
inappropriate for repha. I will have to re-read that section again - it
isn't making sense to me.
I'm not sure that Graphite + Padauk is the only Unicode 4.1-compliant
implementation of the Burmese script outside of Burma.
Should U+1039 MYANMAR SIGN VIRAMA be the conjoiner or the visible sign? The
(immutable) name implies the visible sign, but the proposal makes it the
conjoiner.
Richard.
This archive was generated by hypermail 2.1.5 : Sat Mar 25 2006 - 18:20:04 CST