Re: Khmer Subscripts: encode directly or no

From: Timothy Partridge (timpart@perdix.demon.co.uk)
Date: Fri Aug 15 1997 - 15:54:38 EDT


In message <9708151259.AA25496@unicode.org> you recently said:

> It would be good to summarize some of the issues facing Khmer encoding of
> subscripts so as to come to an informed decision:

> Syntax of Unicode requires that escape code follow character it modifies
> (however ISCII VIRAMA has passed muster which has similar semantics
> to Khmer; code placement is the same whether we think of it as coming
> before the subscript or after the preceding consonant/subscript)
>
> In sum, two code encoding of subscripts allows Khmer to be encoded more
> thoroughly than direct (one code) encoding of subscripts. To me the
> preferable decision is obvious: two code encoding.

Two code encoding seems to be consistent also with the Thai which I
understand is closely related to Khmer.

I know next to nothing about Khmer, but looking in a dictionary (English-
Khmer Dictionary, Huffman and Proum) two points seemed to stand out.

There is a vowel symbol which can stand on its own or as the second part of
a two part vowel. It looks like a small vertical line above the consonant.
I think it is a short "a" sound by itself and it shortens other vowels.
The dictionary states that the symbol is put over the consonant *following*
the consonant that the vowel affects. In this way it is a bit like the two
part Indic vowels, but the right half goes right rather than the left half
left. This raises the issue about where the symbol should appear in the text
stream. Should it appear immediately after the consonant it influences, or
after the consonant it is printed on. The existing Indic treatment would
suggest the verbal rather than the written sequence. How is this treated
by Cambodians?

There are two diacritics which go above the letter (two vertical lines
and an inverted w). I think they are tone marks. The dictionary says that
when there is also a vowel that is written above the consonant, the diacritic
gives way to it and moves below the consonant and is written as a single
vertical line. Will this be left to the rendering engine, or will there be
a separate vertical line below diacritic which the user will use instead?
(I think the Arabic positioning of hamza above or below alef is user selected.)
A moving diacritic presents some challenges in giving it a combining class!

I apologise if these questions are naive.

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT