From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Mon May 21 2007 - 03:29:13 CDT
Who or what chooses which is the correct order for combining marks in
strings in the Thai script when some of them belong to the 'inherited'
class? Is the order established? I appreciate that the interaction of
interacting marks is undefined in such cases.
The problem arises when copying the pronunciation from English-Thai
dictionaries. For example, is the pronunciation of 'vision' correctly
entered as
วิชʹช͙ัน or วิชʹชั͙น? (This notation is taken from a pocket dictionary.)
The first sequence has <U+0E0A CHO CHANG, U+0359 COMBINING ASTERISK BELOW,
U+0E31 THAI CHARACTER MAI HAN-AKAT> and the second has <U+0E0A, U+0E31,
U+0359>. They are canonically inequivalent because U+0E31 is of canonical
combining class 0. The Thai and Latin sequencing principles, plus the fact
that there is a functional unit <U+0E0A, U+0359> representing the 'zh'
sound, argue for <U+0E0A, U+0359, U+0E31>, but the overstrict Uniscribe
implementation on Windows XP seems to argue for <U+0E0A, U+0E31, U+0359>.
Richard.
This archive was generated by hypermail 2.1.5 : Mon May 21 2007 - 03:34:20 CDT