From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Tue Aug 23 2005 - 01:55:15 CDT
On Tuesday, August 23rd, 01:11Z Kenneth Whistler wrote:
> Korean *should* be simple and straightforward.
Should have been. Now it is a little bit late.
> It isn't.
>
> Why? Because it wasn't encoded once in the standard -- it was
> encoded *FOUR* times.
>
> Encoding #1: U+1100..U+11F9, as combining jamos
> Encoding #2: U+AC00..U+D7A3, as preformed syllables
> Encoding #3: U+3131..U+318E, as compatibility jamos
> Encoding #4: U+FFA0..U+FFDC, as halfwidth jamos
I am not sure there is a lot of use of the two latters, particularly the
3rd; I understand this leads to complications for maximally conformant
softwares (like those that Ken writes), for no useful purposes in practice.
Nevertheless, the point is elsewhere.
Ken (probably purposely) is dropping a fifth (!) encoding scheme for Hangul:
U+3400..U+4DB5 (removed in 1996). Which is probably a forgotten thing now
(for the best), but certainly was a headache some years ago.
BTW, diacriticked Latin is encoded at least thrice, and the same algorithms
used for reduction of the latter could be used for the former, couldn't
they?
> But sorting *Korean* in Unicode
Doesn't it mean collating Hanja characters as well? (so intermixing them
with their Hangul reading, etc.)
Antoine
This archive was generated by hypermail 2.1.5 : Tue Aug 23 2005 - 01:56:06 CDT