RE: Problems/Issues with CJK and Unicode

From: Jungshik Shin (jshin@pantheon.yale.edu)
Date: Fri Apr 07 2000 - 15:13:24 EDT


On Fri, 7 Apr 2000, Hoon Kim wrote:
> From: Mark.Conover@luminant.com [mailto:Mark.Conover@luminant.com]
>
>
>
> I have heard that there are "problems" with the way Unicode handles CJK
> script; perhaps due to the unification of some characters. Would someone
> in this list mind offering a bit more insight into this matter?

> "Sort" would be one of those problem.
> (For Korean and Japanese, you would expect to sort by pronunciation, which
> would be different than the order Unihan characters were placed on)

  Well, as others have already pointed out, it's NOT possible
to sort per Unicode-point in any language script. (Similar problem was
raised for MS's proprieatary extension of EUC-KR to make Windows-949, but
that's NOT a vaild point, either stemming from the ignorance of the fact
that even in ISO-8859-1 sorting can never be done per code-point order in
any country) That's why we need a separate collation tables/maps in each
individual locale. Once prounciation-map is complete(in some cases,
it's challenging because some characters are never used in a certain
country and assiging pronunciation could be problematic), it's trivial
to make such a collation map for CJK ideograms in Korean (and Japanese).

    Jungshik Shin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT