Re: Problem with 3.0 Hangul Syllable Composition algorithm

From: John Cowan (jcowan@reutershealth.com)
Date: Mon May 15 2000 - 10:35:38 EDT


Mark Davis wrote:

> There was an problem when it came to normalization. Since the compatibility
> composition of jamo pieces was, in itself, not adequate for keyboard composition
> (one needs a more sophisticated algorithm in any event) it was felt best to
> simply remove the compatibility decompositions.
>
> The problem with normalization was that NFKC would not have produced canonical
> syllables had we left them in.

I'm confused. In KC normalization, the first macro-step converts all characters
with compatibility decompositions to their equivalents. That would have
disassembled ks_f to k_f s_f. Step 1 of Hangul composition would then immediately
undo this. (Obviously a smart KC algorithm would be able to skip both steps in
simple cases.)

> While if you dig back in time to 1998-12-16, in the draft version of
> http://www.unicode.org/unicode/reports/tr15/tr15-9.html, you will see the
> following [CC was the draft name for KC]:
>
> > Normalization Form CC Examples
> > ...
> > u' kakk k_i + a_m + k_f + k_f kak + k_f
> > Hangul syllables are not maintained.

I see that TR15-9 says that, but I maintain that it is in error relative
to Unicode 2.1. Not that it matters any more.

> This is probably more detail than you really wanted, but I thought it might be
> interesting to you to see some of the progression in this particular case.

It certainly is interesting. It is important that the reasons for past
decisions be documented, so that they are not ignorantly reopened when all
the former decision-makers have dropped out.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT