From: Jungshik Shin (jshin@mailaps.org)
Date: Fri May 09 2003 - 08:33:11 EDT
On Thu, 8 May 2003, Marco Cimarosti wrote:
> Jarkko Hietaniemi wrote:
> > Another potential Gedankenexperiment would of course be a
> > Cleanencoding, but I guess the WCode is already quite
> > good an attempt in that direction (though I must admit
> > that the WTF encoding makes me grimace a bit :-)
>
> Here is Markus' Wcode, for the benefit of new list members:
>
> http://www.mindspring.com/~markus.scherer/unicode/wcode.html
WCode, as it stands, is not 'clean' enough to me for Korean
script.
* WCode contains all Unicode characters except ones with a
decomposition of any kind. Normalization on WCode only sorts
combining characters in canonical order. (This removes some
13000(?) characters from the BMP. WCode is mostly Unicode NFKD.)
If I could begin from the scratch, I'd remove all 'cluster Jamos' in
U+1100 block in addition to precomposed Hangul syllables (that are
removed by the above provision). That leaves us with 17 ( + 4 ) leading
consonants, 11 medial vowels and 17 ( + 4 ) trailing consonants along
with leading Jamo filler(U+115F) and vowel filler(U+1160) [1], totalling
55 code points down from over 12,000 code points for Korean script
freeing up a huge amount of code space in BMP for *much better* use. [2]
This has an additional benefit of making SCSU/BCU better suited for
Korean text represented in Jamos because all Jamos can fall within a
single sliding window of SCSU/BCU. It also simplifies collation/sorting.
Jungshik
[1] We can cut down code points further by encoding consonants only
once (and perhaps adding trailing consonant filler). Here we have 35
code points. In this scheme, a regular Korean syllable takes the form
of L+V+T+M? where L,V, and T include fillers. Similar encodings were
used in mid-1980's on Korean Unix systems (before KS C 5601-1987, now
KS X 1001:1998)
[2] WCode already frees up 11,172 code points as it stands, my scheme
gives us back about 180-210 more.
This archive was generated by hypermail 2.1.5 : Fri May 09 2003 - 09:22:38 EDT