Document: L2/01-310
Title: Khmer Issues on the Horizon
Authors: Rick McGowan and Ken Whistler
There appears to be some perceived difficulty with the Khmer encoding, from
the point of view of some factions within Cambodia. We have had
reports to this effect. The trouble appears to be twofold:
(1) looming political trouble due to a bad feeling among some that
Cambodian interests where neglected or ignored during the development
of the Khmer encoding in 10646/Unicode.
(2) mis-understanding of and/or disagreement with the current model for
encoding Khmer.
Below are some facts and rumors, as we understand them.
* RUMOR: The government of Cambodia has apparently contacted ISO JTC1
directly with complaints regard to the encoding. They may call for
rescinding the 10646 encoding at that level.
* FACT: During the initial development of Khmer encoding, Glenn Adams
cautioned careful procedure, and at one time planned a trip to Cambodia,
which never materialized.
* FACT: In early contact with Norbert Klein (who is _IN_ Cambodia), circa
1997, he claims that he offered to put Unicode people into contact with
government officials, but now reports that from the Unicode side nobody
followed through. For our part, we wonder why, if he was on the ground there,
he did not simply take action to involve more people. He was on the Khmer
mail list at Unicode, and was involved in all the discussions.
* FACT: There is a Cambodian government project underway to define a
national standard character encoding. RUMOR: We have heard that this committee
desire "one codepoint, one character" approach, and it seems possible that
they do not understand the current model, or understand it but disagree
with it sufficiently to continue with standardizing a different approach.
* RUMOR: Microsoft apparently has a working model of the current encoding
in-house.
* RUMOR: There apparently exists a Japanese funded philological project
which in one report is urging a different sort of encoding; and in another
report is awaiting a national standard to be handed down. In neither case is
Unicode being considered, apparently. (And it is also apparent that they may
not understand the current model.)
* FACT: Rick twice sent e-mail to Sorasak Pan urging contact between his
committee and Unicode, and has received no response to date. Address:
Sorasak Pan
Under Secretary of State
Royal Government of Cambodia
Russian Federation Blvd.
Tel: (855 23) 426 054
Fax: (855 23 218 673
e-mail: Great_Lake@bigpond.com.kh
* FACT: Relevant experts outside of Cambodia who were involved
in the encoding are: Maurice Bauhahn and Paul Nelson. Inside Cambodia
is Norbert Klein.
* ANALYSIS (Ken):
The basic technical issue, as best I understand it,
boils down to the virama model versus the encoding of subscript consonants
(and vowels). The current Unicode model for the Khmer script assumes the
virama model, as for many other Brahmi-derived scripts, including
Myanmar. However, as was apparent in the Japanese NB comments on Amendment
25, there were experts at the time who disagreed with that approach and
favored an explicit subscript encoding for Khmer. While the virama model
was discussed in Cambodia and apparently had some support from some
technologists there, there appear to have been significant political
shifts, resulting now in significant opposition to that approach, apparently
at a ministerial level in the government. I expect that the basic nature
of any new proposals that emerge from Cambodia and/or Japan will be to
encode explicit subscripts for Khmer. And it is quite likely that any
such proposal will, given its nature, once again be of the nature of
"remove the current encoding and replace it with the xxx national standard
for yyy," rather than an attempt to make delta additions to the current
encoding.
Note that the Khmer script is basically used only in Cambodia, so
that there is a prima facie case for the government or relevant
ministry to be the compelling stakeholder in this case. This is
a much easier case to make for a local script like this than for
a multinational script like Latin, Cyrillic, or Han. It will almost
certainly be argued this way at the JTC1 level, if it comes to that.
I also suspect that all parties here have some well-intentioned,
strong arguments about productivity in information technology in
mind -- particularly for keyboard entry. Different opinions about
what is "right" and more efficient for shifting people over from
existing typewriter keyboarding practice to computerized text entry
may be part of what is driving people to different positions regarding
what is the best encoding for Khmer.
We are bringing these issues to the attention of the UTC, since we
suspect that the Khmer encoding will be raised at the upcoming
Singapore meeting of WG2, and that in that context a very forceful
case will be made to change the Khmer encoding in 10646 (with obvious
implications for the Unicode Standard).