From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sat Mar 25 2006 - 18:15:15 CST
Michael Everson wrote:
>>Is this a Unicore escape?
>
> It was a typo on Ken's part. Discussion of this proposal with members of 
> the UTC is fairly intense.
I'm not surprised.  There are issues of principle involved and the proposal 
sets dangerous precedents.  The debate could be worse than the row over 
Phoenician.  I was a bit slow to understand the 'double hockeysticks' in 
this threads original title.
I've posted this to both lists because there are matters of principle, 
matters of fact, and matters of Myanmar script practicalities.  I think any 
replies on matter of principle belong on the general list, while replies 
about the other two belong on the  SEAsia list.  (Is it being moderated at 
the weekend?  I don't think general list moderation status automatically 
transfers to the SE Asia list.)
> I don't personally have the time or energy to get into too much of it on 
> this list.
I'll list the major issues *I* can see.  There may be others.
Issue 1: Point of Principle/Pride
The Burmese want their glyph-based input, and it's proposed that they get 
it, subject to the following restrictions:
(a) 'Logical' order.  (I'm pretty sure it isn't actually phonetic order!)
(b) Smart fonts sort out positional variation and combination of subscript 
forms (and this concession is actually a clear gain)
(c) Pali-only / archaic subscript and superscript consonants are written as 
subjoining character plus normal character and similar methods.
Issue 2: Disunifying AA - Point of Principle
> And then there are differences between Burmese and Mon, where Mon 
> regularly uses TALL AA with PHA, though modern Burmese doesn't. (An 1840 
> Burmese Bible does, however.)
> It is proposed to disunify TALL AA from AA because (1) adding a 
> non-variable TALL AA for Karen use would introduce ambuguous encounters
And be pointless.  S'gaw Karen just uses different glyph variants to 
Burmese, and the tall AA and short AA glyphs happen to be the same.
I can see two types of argument that would justify disunifying the variants.
Single language argument:  The rules for choosing between tall AA and short 
AA cannot reasonably be implemented in a rendering system.  Cf. the two 
forms of Latin small 's' and the  two forms of Greek small sigma, for which 
I think minimal pairs actually exist.  The single language need not be 
Burmese; Mon would also do.
Mixed-language argument:  There are (or will be) many documents displaying 
two or more of Burmese, Mon and S'gaw Karen together, with the text in each 
language using that language's preferred systems of selecting between the AA 
forms.  (This is akin to demonstrating that there are three separate 
scripts, and then saying that corresponding characters should mostly be 
unified.)
The easiest examples, if they existed, would be lists of people's names from 
both languages that used the 'aa' form appropriate to the person's language. 
This would be both a single language argument and a mixed-language script.
The proposal presents evidence for neither argument.
> In scripts like Lanna and Myanmar, where it is really *not* possible to 
> contextually select the display, the only sensible thing is to encode both 
> AA and TALL AA and let users use the one they want when they want it.
If you can justify that statement for the Myanmar script, then you have 
established the case for separate encoding of AA and TALL AA.  That still 
leaves open the option of doing it by variation selectors, but they can be 
rendered pointless by the Burmese always using a variation selector. 
(Pressing an AA key can generate two characters - the generic AA and the 
appropriate variation selector.)  Does anyone care to expound the theory of 
variation selectors?  There may be words in white in the TUS saying 'only 
for unifying CJK variants that the Chinese (or Japanese, especially with 
surnames) insist are different.'
At present there is the significant possibility of ISO/IEC opposition to 
this disunification.
Issue 3: Abolition of Unicode Virama: Floodgate, Myanmar Stability, and 
Stability Pact
The creation of ASAT in place of the 2-character visible virama and the 
restriction of virama to a subjoining role immediately invalidates most 
Unicoded Myanmar script text, including my paltry creations.  In principle 
that's a BAD THING.  For myself, I welcome it and look forward to the 
upgrade of SIL's Padauk font.
However, I can see a clamour for other scripts to convert the Unicode virama 
to a historical footnote and separate the concepts of conjoining and visible 
virama.  Unfortunately, this will cause two canonically inequivalent ways of 
doing the same thing.  They have to be canonically inequivalent, because 
virama + ZWNJ has to continue to be in Normal Form C.  Such requests are 
likely to be refused as a point of principle.
Issue 4: Great SA: Mystification, precedent
I am truly baffled as to why this conjunct needs its own encoding.  It is 
currently encoded as SA, virama, SA, which will now come to represent the 
transparent, unligated form.  However, no evidence was provided that this 
form actually occurs!  If it did, I would suggest that SA, ZWNJ, VIRAMA, SA 
be the appropriate representation.  (Remember that Myanmar VIRAMA would no 
longer be a Unicode virama!)
This sets a dangerous precedent of allowing a separate character for every 
non-transparent conjunct.  Codepoint for Devanagari KSHA?  JNYA?  Moreover, 
by the stability pact, these will be inequivalent to the current sequences.
Issue 5: Medial WA: Practical Issue
This is partly a question of phonology.  Medial WA and WA as part of a 'true 
conjunct' will be encoded differently.  How does one tell them apart when 
entering them?  Much of the time, how will one tell them apart visibly? 
And, finally, are the people of Burma consistent in deciding whether a 'WA' 
in the middle of a word is medial or the second part of a conjunct? 
Fortunately, 'true conjunct' WA is fairly rare, but it occurs in the sort of 
word liable to be learnt from a book rather from speech.
Quibbles:
The new way of encoding kinzi seems unnaturally complicated, and quite 
inappropriate for repha.  I will have to re-read that section again - it 
isn't making sense to me.
I'm not sure that Graphite + Padauk is the only Unicode 4.1-compliant 
implementation of the Burmese script outside of Burma.
Should U+1039 MYANMAR SIGN VIRAMA be the conjoiner or the visible sign?  The 
(immutable) name implies the visible sign, but the proposal  makes it the 
conjoiner.
Richard. 
This archive was generated by hypermail 2.1.5 : Sat Mar 25 2006 - 18:20:04 CST