trying to understand the relationship between the Version 1 Hangul syllables and the later versions'
public at khwilliamson.com
Wed Jun 24 15:03:09 CDT 2015
On 06/19/2015 04:12 PM, Ken Whistler wrote:
> As usual, the situation is way more complicated that perhaps it has any
> It isn't just Version 1 Hangul that have to be considered, but also
> Version 1.1 Hangul.
> Version 1.0 contained 2350 Hangul syllables, encoded in the range
> Version 1.1 contained 6646 Hangul syllables, encoded in the range
> and a distinct new range 3D2E..4DFF. It thus added 4306 to what was in
> Version 1.0 already.
> Version 2.0 (and all subsequent versions) contained the 11172 Hangul
> syllables we now see, encoded in the range AC00..D7A3. Version 2.0
> *deleted* all the Hangul syllables in the range 3400..4DFF.
> You also need to pay attention to the history of the encoding of jamo.
> Version 1.0 contained 94 "Hangul Elements", encoded in the range
> Version 1.1 retained the same 94 "Hangul Letters" in the range 3131..318E.
> Version 1.1 added 240 conjoining jamo letters in the range 1100..11F9.
> Version 2.0 retained both of those sets.
> O.k., now what were those various chunks?
> The Unicode 1.0 set of 2350 was encoded for compatibility with KS C
> They were given no formal decompositions (the concept didn't yet exist),
> the implication in the standard was essentially that Hangul syllables could
> just be spelled out with jamo letter sequences. The details were an
> for implementation, however, and were soon overtaken by events in
> the Unicode/10646 merger.
> The Unicode 1.1 set of 4306 additions came from the 10646 merger work,
> and comprised two actual subsets:
> Hangul Supplementary Syllables A (1930 modern syllables) from KS C
> (See the Unicode 1.1 subrange: 3D2E..44BD.)
> Hangul Supplementary Syllables B (2376 old Korean syllables) from KS C
> (See the Unicode 1.1 subrange: 44BE..4DFF.)
> *All* of the Unicode 1.1 Hangul syllables were given decompositions.
> (Although the formalization of Unicode normalization did not yet exist.)
> The decompositions can be see in UnicodeData-1.1.5.txt. Because the
> syllables were then encoded in three "alphabetical" extents, with a few
> stragglers tucked
> on, the decompositions were not algorithmically defined -- they were just
> enumerated in the data file. The decompositions involved the new set of
> conjoining jamo letters, rather than the older set, which were relegated
> to compatibility mapping status.
> The Unicode 2.0 set of 11,172 was known as the "Johab" set from KS C
> That was an algorithmically designed replacement of the earlier sets from
> Korean standards -- designed to cover all modern syllables algorithmically,
> by putting all the combinations of initial, medial and final jamos in a
> alphabetical order, whether or not each syllable that resulted was actually
> attested in modern Korean use or not.
Does this mean the original 2 standards (KS C 5601-1987 and KS C
5657-1991) fell into disuse (or perhaps never were actually used) so
there was no need to map the new code points to them (hence no
> There was an enormous hullabaloo at the time, of course, about the changes
> required to switch over from the old ranges to the new set. But the whole
> shebang was balloted as Amendment 5 to ISO/IEC 10646-1:1993, and when
> that ballot passed, Unicode adopted the change wholesale into the
> documentation and data files for Unicode 2.0, to stay in synch.
> But "The Korean Mess", as it was then known, led directly to the
> by both SC2 and the UTC that such re-encoding of already standardized
> and published characters was enormously damaging to both standards.
> It was also expensive to the early implementers: Oracle, for example, long
> maintained distinct database support for the Unicode 1.1 Korean, which was
> incompatible with the Unicode 2.0 Korean.
> In any case, if anybody has any lingering questions about why the following
> policy exists and is *strictly* enforced:
> or why the applicable version for that stability policy is 2.0+, the
> answer is
> that it was a direct reaction to "The Korean Mess".
> On 6/19/2015 1:29 PM, Karl Williamson wrote:
>> I haven't found any information on this. It can't just be a
>> transliteration difference, because the number of code points is
>> vastly different between them.
>> Is it the case that the version 1 syllables is a failed abstraction
>> that was replaced by the later versions?
More information about the Unicode