Re: Problem with 3.0 Hangul Syllable Composition algorithm

From: Mark Davis (markdavis@ispchannel.com)
Date: Mon May 15 2000 - 10:36:39 EDT


There was an problem when it came to normalization. Since the compatibility composition of jamo pieces was, in itself, not adequate for keyboard composition (one needs a more sophisticated algorithm in any event) it was felt best to simply remove the compatibility decompositions.

The problem with normalization was that NFKC would not have produced canonical syllables had we left them in. If you are interested in the exegesis, you can follow the trail of TR#15 back through its drafts.

In #18 (the one that is part of Unicode 3.0), you'll see the remark:

> Normalization Forms KD and KC Examples
> ...
> u' kaks ki + am + ksf kaks Hangul syllables are maintained under normalization.*
>
> *In earlier versions of Unicode, jamo characters like ksf had compatibility mappings to kf + sf. These
> mappings were removed in Unicode 2.1.9 to ensure that Hangul syllables are maintained.)
>
While if you dig back in time to 1998-12-16, in the draft version of http://www.unicode.org/unicode/reports/tr15/tr15-9.html, you will see the following [CC was the draft name for KC]:

> Normalization Form CC Examples
> ...
> u' kakk ki + am + kf + kf k ak + kf Hangul syllables are not maintained.
>

The precise specifications for normalization and the resultant data issues were discussed extensively in the UTC and ad hoc subcommittee meetings over a period of time. This particular change was approved in UTC #79 [Palo Alto, CA – February 3-5, 1999, hosted by Hewlett-Packard]:

[#79-M9] Motion: To remove compatibility mapping from characters U+1100 through U+11F9 (hangul jamo block).

If you look at ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.html, under "Modification History" for "Unicode 2.1.9" you will see

> * Removed decompositions from the conjoining jamo block: U+1100..U+11F8.
>
This is probably more detail than you really wanted, but I thought it might be interesting to you to see some of the progression in this particular case.

If you want to watch for changes in particular documents, I'd recommend a service like http://mindit.netmind.com/. With it you can 'mark' any particular pages on the Unicode website (or other websites, of course). Whenever those pages change, you'll be notified (the service also highlights the changes for you).

Mark

John Cowan wrote:

> Mark Davis wrote:
>
> > Good catch. We didn't change the language when we dropped the mappings.
> > However, your suggested fix won't work, since the goal of removing the
> > table was precisely to not do the composition.
>
> Ah. Okay, what is the reason for this (AFAIK quiet) change?
>
> --
>
> Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com>
> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com
> Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan
> Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT