Korean [Was: 28th IUC paper - Tamil Unicode New]

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Tue Aug 23 2005 - 01:55:15 CDT

Next message: Antoine Leca: "Re: 28th IUC paper - Tamil Unicode New"

Previous message: Doug Ewell: "Re: Questions re ISO-639-1,2,3"
In reply to: Kenneth Whistler: "Re: [indic] Re: 28th IUC paper - Tamil Unicode New"
Next in thread: Kenneth Whistler: "Re: Korean [Was: 28th IUC paper - Tamil Unicode New]"
Maybe reply: Kenneth Whistler: "Re: Korean [Was: 28th IUC paper - Tamil Unicode New]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Tuesday, August 23rd, 01:11Z Kenneth Whistler wrote:
> Korean *should* be simple and straightforward.

Should have been. Now it is a little bit late.

> It isn't.
>
> Why? Because it wasn't encoded once in the standard -- it was
> encoded *FOUR* times.
>
> Encoding #1: U+1100..U+11F9, as combining jamos
> Encoding #2: U+AC00..U+D7A3, as preformed syllables
> Encoding #3: U+3131..U+318E, as compatibility jamos
> Encoding #4: U+FFA0..U+FFDC, as halfwidth jamos

I am not sure there is a lot of use of the two latters, particularly the
3rd; I understand this leads to complications for maximally conformant
softwares (like those that Ken writes), for no useful purposes in practice.
Nevertheless, the point is elsewhere.
Ken (probably purposely) is dropping a fifth (!) encoding scheme for Hangul:
U+3400..U+4DB5 (removed in 1996). Which is probably a forgotten thing now
(for the best), but certainly was a headache some years ago.

BTW, diacriticked Latin is encoded at least thrice, and the same algorithms
used for reduction of the latter could be used for the former, couldn't
they?

> But sorting *Korean* in Unicode

Doesn't it mean collating Hanja characters as well? (so intermixing them
with their Hangul reading, etc.)

Antoine

Next message: Antoine Leca: "Re: 28th IUC paper - Tamil Unicode New"
Previous message: Doug Ewell: "Re: Questions re ISO-639-1,2,3"
In reply to: Kenneth Whistler: "Re: [indic] Re: 28th IUC paper - Tamil Unicode New"
Next in thread: Kenneth Whistler: "Re: Korean [Was: 28th IUC paper - Tamil Unicode New]"
Maybe reply: Kenneth Whistler: "Re: Korean [Was: 28th IUC paper - Tamil Unicode New]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Aug 23 2005 - 01:56:06 CDT