Korean [Was: 28th IUC paper - Tamil Unicode New]

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Tue Aug 23 2005 - 01:55:15 CDT

  • Next message: Antoine Leca: "Re: 28th IUC paper - Tamil Unicode New"

    On Tuesday, August 23rd, 01:11Z Kenneth Whistler wrote:
    > Korean *should* be simple and straightforward.

    Should have been. Now it is a little bit late.

    > It isn't.
    >
    > Why? Because it wasn't encoded once in the standard -- it was
    > encoded *FOUR* times.
    >
    > Encoding #1: U+1100..U+11F9, as combining jamos
    > Encoding #2: U+AC00..U+D7A3, as preformed syllables
    > Encoding #3: U+3131..U+318E, as compatibility jamos
    > Encoding #4: U+FFA0..U+FFDC, as halfwidth jamos

    I am not sure there is a lot of use of the two latters, particularly the
    3rd; I understand this leads to complications for maximally conformant
    softwares (like those that Ken writes), for no useful purposes in practice.
    Nevertheless, the point is elsewhere.
    Ken (probably purposely) is dropping a fifth (!) encoding scheme for Hangul:
    U+3400..U+4DB5 (removed in 1996). Which is probably a forgotten thing now
    (for the best), but certainly was a headache some years ago.

    BTW, diacriticked Latin is encoded at least thrice, and the same algorithms
    used for reduction of the latter could be used for the former, couldn't
    they?

    > But sorting *Korean* in Unicode

    Doesn't it mean collating Hanja characters as well? (so intermixing them
    with their Hangul reading, etc.)

    Antoine



    This archive was generated by hypermail 2.1.5 : Tue Aug 23 2005 - 01:56:06 CDT