Order of Infrequent Combining Marks in Thai

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Mon May 21 2007 - 03:29:13 CDT

  • Next message: mpsuzuki@hiroshima-u.ac.jp: "Re: [unicode] CJK variation modifier"

    Who or what chooses which is the correct order for combining marks in
    strings in the Thai script when some of them belong to the 'inherited'
    class? Is the order established? I appreciate that the interaction of
    interacting marks is undefined in such cases.

    The problem arises when copying the pronunciation from English-Thai
    dictionaries. For example, is the pronunciation of 'vision' correctly
    entered as
    วิชʹช͙ัน or วิชʹชั͙น? (This notation is taken from a pocket dictionary.)
    The first sequence has <U+0E0A CHO CHANG, U+0359 COMBINING ASTERISK BELOW,
    U+0E31 THAI CHARACTER MAI HAN-AKAT> and the second has <U+0E0A, U+0E31,
    U+0359>. They are canonically inequivalent because U+0E31 is of canonical
    combining class 0. The Thai and Latin sequencing principles, plus the fact
    that there is a functional unit <U+0E0A, U+0359> representing the 'zh'
    sound, argue for <U+0E0A, U+0359, U+0E31>, but the overstrict Uniscribe
    implementation on Windows XP seems to argue for <U+0E0A, U+0E31, U+0359>.

    Richard.



    This archive was generated by hypermail 2.1.5 : Mon May 21 2007 - 03:34:20 CDT