RE: Problems/Issues with CJK and Unicode

From: Jungshik Shin (jshin@pantheon.yale.edu)
Date: Fri Apr 07 2000 - 15:43:33 EDT


On Fri, 7 Apr 2000, Jungshik Shin wrote:

> On Fri, 7 Apr 2000, Hoon Kim wrote:
> > From: Mark.Conover@luminant.com [mailto:Mark.Conover@luminant.com]
> >
> > I have heard that there are "problems" with the way Unicode handles CJK
> > script; perhaps due to the unification of some characters. Would someone

> > "Sort" would be one of those problem.
> > (For Korean and Japanese, you would expect to sort by pronunciation, which
> > would be different than the order Unihan characters were placed on)
>
> Well, as others have already pointed out, it's NOT possible
> to sort per Unicode-point in any language script. (Similar problem was
......
> individual locale. Once prounciation-map is complete(in some cases,
.....
> country and assiging pronunciation could be problematic), it's trivial
> to make such a collation map for CJK ideograms in Korean (and Japanese).

   Thanks to Rick McGowan, I realized I made a mistake when adding
"(and Japanese)". I should have added '??' next to Japanese as I had some
dobuts about Japanese. Still better is I shouldn't have talked about
Japanese at all. For a moment, I forgot about the way CJK ideograms
are used and pronounced in Japanese when writting the above.

  Even for Korean(ko-KR locale), there *COULD* be a little
complexity due to that [r] cannot be placed at the beg. of word(
[n] cannot be placed at the beg. of word when followed by some vowels)
if users are not careful when converting Hangul to Hanja(a typical way
of Hanja/CJK ideogram input in Korea). For ko-KP locale, this problem
doesn't exist as they use different orthographic rules not
honoring prohibition of [r]([n]) at the beg. of word. This
complexity is by no means unique to Unicode, though.

    Jungshik Shin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT