RE: Why is U+17C1 of General category Mc while U+0E40 and U+0EC) are of category Lo ?

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Wed Mar 31 2004 - 06:55:47 EST

  • Next message: Peter Kirk: "Re: Fixed Width Spaces (was: Printing and Displaying DependentVowels)"

    jcowan@reutershealth.com wrote:
    > Thai (and Lao, whose encoding closely parallels that of Thai) are
    > encoded in Unicode on unique principles: by a straight left-to-right
    > typewriter-style encoding. This was done for compatibility with the
    > pervasive Thai 8-bit standard. It also means that for collation
    purposes
    > what are historically left-side vowels must be moved after
    > the following consonant.

    For more on collation of Thai, Lao, and Khmer, see the proposed update
    to
    ISO/IEC 14651 CTT (and the UAX 10 DUCET), and a tailoring for the CTT,
    in the two documents:
    N2718 http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2718.doc
    N2717 http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2717.doc

    (Note that the "swapping" part for Thai/Lao of the tailoring is dealt
    with
    by other means (in the prehandling) in the Unicode collation algorithm.)

    > Note that the Thai characters are not labeled LETTER or VOWEL SIGN or
    > what have you, but simply CHARACTER.

    Yes, but that has no particular consequence. Note that the vowel signs
    are in the documents referenced above treated as vowel signs, regardless
    of if they are called "LETTER", "VOWEL SIGN", or "CHARACTER" (and,
    actually, regardless of their general category, as it happens). There is
    also the complication that some of the consonant characters are
    logically used as vowel (parts), but the modern convention is to ignore
    that in the collation rules, and always treat them as consonants in
    collation.

                    /kent k



    This archive was generated by hypermail 2.1.5 : Wed Mar 31 2004 - 07:35:11 EST