From: Mark Davis (mark.davis@jtcsv.com)
Date: Sun May 11 2003 - 14:04:23 EDT
Here is your question, reformatted to always include real characters
and names.*
> Specifically, U+1102 (ᄂ) HANGUL CHOSEONG NIEUN, U+1103 (ᄃ) HANGUL
CHOSEONG TIKEUT and U+1113 (ᄓ) HANGUL CHOSEONG NIEUN-KIYEOK are given
the primary weight of 1832, 1833 and 1844, respectively. With these,
U+1113 (ᄓ) HANGUL CHOSEONG NIEUN-KIYEOK will be sorted after U+1103
(ᄃ) HANGUL CHOSEONG TIKEUT, right? Or am I missing something (I
haven't read UTS #10 through, yet)?
>The order is different from the way (South) Koreans (at least, most
Korean dictionary editors) expect them to be sorted. We expect U+1113
(ᄓ) HANGUL CHOSEONG NIEUN-KIYEOK (and other cluster consonants whose
first component is U+1102 (ᄂ) HANGUL CHOSEONG NIEUN. They're U+1114
(ᄔ) HANGUL CHOSEONG SSANGNIEUN, U+1115 (ᄕ) HANGUL CHOSEONG
NIEUN-TIKEUT, U+1116 (ᄖ) HANGUL CHOSEONG NIEUN-PIEUP) to be put after
U+1102 (ᄂ) HANGUL CHOSEONG NIEUN but before U+1103 (ᄃ) HANGUL CHOSEONG
TIKEUT. The same is true of any cluster Jamos.
> Is it UTC's intention to leave the task of making Hangul Jamos
collate in accordance with (South) Koreans' expectation to (South)
Korean specific tailoring?
We know that there are problems with Korean collation, particularly
with non-modern Korean characters, and that the fixes will most likely
involve a reordering of the Jamo characters as well as other changes.
We have been trying to work with the WG20 committee to resolve them,
due to a desire to maintain synchrony with ISO 14651 in weights.
Progress in that committee, unfortunately, has been exceedingly slow.
At the last committee meeting early this year, we agreed to work out
details of a requirements document by email, but there has been as yet
simply no response to the draft suggested by the UTC. So I am less
than sanguine about the prospects for any kind of timely resolution.
In the meantime, the work-around is to tailor the Jamo characters to
interleave the characters properly, and follow one of the approaches
in UCA 7.1.4 at
http://www.unicode.org/reports/tr10/tr10-10.html#Trailing_Weights.
Thanks for bringing this interleaving issue up; we should add a
description to section 7.1.4.
Mark
* Using http://oss.software.ibm.com/cgi-bin/icu/tr with the following
transform in "Compound 1" will change all instances of U+XXXX to add
the real character and the hex name; much easier to see what is being
described.
[:^ASCII:] hexandname
________
mark.davis@jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799
----- Original Message -----
From: "Jungshik Shin" <jshin@mailaps.org>
To: "Mark Davis" <mark.davis@jtcsv.com>
Cc: <unicode@unicode.org>
Sent: Saturday, May 10, 2003 19:22
Subject: Re: Proposed Update of UTS #10: Unicode Collation Algorithm
>
>
>
> On Fri, 9 May 2003, Mark Davis wrote:
>
> > There is a new Proposed Update of UTS #10: Unicode Collation
> > Algorithm, on:
> >
> > http://www.unicode.org/reports/tr10/tr10-10.html
>
> Just a quck question before reading it through and comment on it.
Will
> allkeys.txt for 4.0 keep weights given to Hangul Jamos? The
following
> is written under the assumption that it will.
>
> Specifically, U+1102 (Nieun), U+1103 (Tikeut) and
U+1113(Nieun-Kiyeok) are
> given the primary weight of 1832, 1833 and 1844, respectively. With
these,
> U+1113 will be sorted after U+1103, right? Or am I missing something
> (I haven't read UTS #10 through, yet)? The order is different from
the
> way (South) Koreans (at least, most Korean dictionary editors)
expect
> them to be sorted. We expect U+1113 (and other cluster consonants
whose
> first component is U+1102. They're U+1114, U+1115, U+1116) to be put
> after U+1102 but before U+1103. The same is true of any cluster
Jamos.
> Is it UTC's intention to leave the task of making Hangul Jamos
collate in
> accordance with (South) Koreans' expectation to (South) Korean
specific
> tailoring?
>
> Thanks,
>
> Jungshik
>
>
>
This archive was generated by hypermail 2.1.5 : Sun May 11 2003 - 14:43:47 EDT