From: Jungshik Shin (jshin@mailaps.org)
Date: Sat Aug 30 2003 - 06:17:35 EDT
On Tue, 26 Aug 2003, Kent Karlsson wrote:
Kent,
Thank you for your work on Korean sorting and sorry for my late reply.
I'll be very brief because I have something urgent to take care of.
> Jungshik Shin wrote:
>
> You may wish to look at
> http://std.dkuug.dk/JTC1/SC22/WG20/docs/n1051-hangulsort.pdf
> which contains a much updated version of my paper on the subject.
> The table entries are also found in plain text form at
> http://std.dkuug.dk/JTC1/SC22/WG20/docs/n1051t-table-hangulctt6.txt
Wow, you've created all these entries. Thanks.
> > After a thread of emails exchanged, Mark Davis and I found
> > that both of us
> > are more or less in the same page as to how Hangul letters be
> > collated.
> > In summary,
> >
> > 1. Weights for T, V, and L should be assigned in such a way that
> > T < V < L for all T, V, and L's
>
> That would be L < T < V; but that is complicated by the actual need for
> (the superficially contradictory) V < L < T < V, with the latter T and V
> after all scripts.
I'm not following you here. 'T < V < L' works well in Mark's
and my scheme for the most generic form of Korean syllables, 'L+V+T*'
as far as South Korean collation rules are concerned.
> The Vs at two radically different positions in the table
> is for different positions of the V in a syllable; V < L is for first V in
> a syllable, T < V is for non-first Vs in a syllable.
Aha, you're talking about your scheme.
> > 2. Expand precomposed (cluster) Jamos into sequences of component
> > basic Jamos
>
> Needed for covering all combinations of Jamos. If limited to (a superset)
> of modern Jamo, this expansion can be avoided.
Absolutely.
> referenced above, which lists the weightings and contractions needed for
> avoiding this expansion in many (but not all) cases.
>
> > 3. Terminate every syllable with 'TERM' that has a lower weight than
> > all T's (there's an alternative to this, but both favors this
> > more than the alternative)
>
> This can be avoided if the weighting is done in a particular way.
> See my paper for details.
Indeed. However, I'm wondering if avoiding TERM is a better
trade-off than avoiding seemingly more complex(than Mark's and mine)
scheme of yours that also requires pre-handling. Could you give me some
rationale behind your preferring yours to the other? Is it because it's
better suited to tailoring for North Korean? I haven't given much thought
to North Korean collation rules recently (at the moment, I have to look
them up again to refresh my memory.)
Jungshik
This archive was generated by hypermail 2.1.5 : Sat Aug 30 2003 - 06:52:55 EDT