Re: Re-assignment of Hangul

From: Mark Davis (mark_davis@taligent.com)
Date: Wed Sep 27 1995 - 13:39:28 EDT


Subject: RE>Re-assignment of Hangul charactersTime: 9:21 AM Date: 9/27/95

I believe that Ed has responded on the political side. The Unicode consortium
is firmly in favor of this change, and believes that ISO will also accept it.

On the technical question you ask, the Hangul syllables have no relation in
ordering to any of the old syllable sets. I'm sorry that we don't have a
mapping table to give you right away, but have not yet gotten that from
Microsoft (who is signed up to produce it).

In the meantime you can use the following information to generate a mapping
yourself. The algorithm describes how to decompose the new Hangul syllables
into jamos; you can use the Unicode character database (on the ftp
unicode.org) to map from jamos into old syllables.

Mark

=============================
Hangul Syllable Decomposition

The following describes how to take Hangul Syllable S and derive the
decomposition C.
First define the following constants (the first four are hexidecimal Uniode
character values; the remainder are decimal):

SBase = AC00
LBase = 1100
VBase = 1161
TBase = 11A7

SCount = 11172
LCount = 19
VCount = 21
TCount = 28
NCount = VCount * TCount

A. Compute the index of the syllable:

SIndex = S - SBase

B. If S is in range (0 <= S < SCount) then compute the components as follows:

L = LBase+ TRUNC( SIndex / NCount )
V = VBase+ TRUNC( MOD( SIndex, NCount ) / TCount )
T = TBase+ MOD( SIndex, TCount )
                
C1. If T = TBase, then there is no trailing character, so replace S by the
sequence <L, V>.

C2. Otherwise there is a trailing characters, so replace S by the sequence <L,
V, T>.

Example:

L = LBase+ 17
V = VBase+ 16
T = TBase+ 15

D4DB => 1111, 1171, 11B6

--------------------------------------
Date: 9/27/95 1:49 AM
To: Mark Davis
From: Len Greenwood

I have some issues regarding the proposed move of Hangul characters
from their current allocations in Unicode 1.1/ISO 10646. Maybe
someone can give me some more information...

On the Unicode ftp site there is a document called hangul-codes.txt
which purports to give the mapping between Johab and Wansung codes
for Hangul characters for Unicode 2.0. From this I can deduce that
characters in Unicode 1.1 in the range U+3400-U+3D2D are moving
at version 2.0, and, from the maps provided for KSC5601-1987, which
is the Wansung encoding, where they are moving to. I can guess that
the characters in the 1.1 "Hangul Supplementary A/B" blocks, U+3D2E-U+4DFF
will also move for 2.0, but I've no idea how they are distributed
through the new range.

We are about to ship alpha software to Korea this week. At this point
only Wansung codes are involved. I can change our mappings from
Unicode 1.1 to Unicode 2.0 positions easily enough, but I am concerned
whether this is their final resting place. Since the customer will be
storing Korean data in Unicode, I don't want to face any more data
conversion scenarios than necessary. So, a couple of questions:

1. Can anyone confirm that the allocation of U+AC00 onwards for Unicode 2.0
   is set in concrete (or, how firm is the concrete right now)? And will
   ISO 10646 ratify the same positions? This is important to us for
   deciding whether to stick with the Unicode 1.1 that we know, or
   take a flier on going to Unicode 2.0 positions in the hope that it
   will remove a conversion later on.

2. Any chance of extending hangul-codes.txt (or provide another file)
   that maps ALL the 1.1 Hangul to their 2.0 places - or, better,
   just add the names for all 11,172 characters (e.g. first is
   HANGUL SYLLABLE KIYEOK A, etc.) Or, since I'm sure the series is
   regularly formed, give the normative names for all 19 leading
   consonants, 21 vowels & 28 trailing consonants _and_the_order_being_
   used_to_generate_the_characters, and one can figure it out.

3. Does anyone have a target date when this character move will be
   confirmed for good?

==============================================================================
   Len Greenwood Internet: greenwood@vmark.co.uk
   VMark Software Ltd. Tel: +44 1908 234990 ext 206
      Power House Fax: +44 1908 234992
      Davy Avenue, Knowlhill
      Milton Keynes, MK5 8HJ, United Kingdom

------------------ RFC822 Header Follows ------------------
Received: by taligent.com with SMTP;27 Sep 1995 01:49:17 -0800
Received: from taligent.com by mailserv.taligent.com (AIX 3.2/UCB 5.64/4.03)
          id AA91952; Wed, 27 Sep 1995 01:49:58 -0700
Received: from UNICODE.ORG by taligent.com with SMTP (5.67/23-Oct-1991-eef)
        id AA13844; Wed, 27 Sep 95 01:46:26 -0700
        for
Received: by Unicode.ORG (NX5.67c/NX3.0M)
        id AA07980; Wed, 27 Sep 95 01:28:32 -0700
Date: Wed, 27 Sep 95 01:28:32 -0700
From: unicode@Unicode.ORG
Message-Id: <9509270828.AA07980@Unicode.ORG>
Reply-To: leng@vmark.co.uk (Len Greenwood)
Errors-To: uni-bounce@Unicode.ORG
Subject: Re-assignment of Hangul characters
To: unicode@Unicode.ORG



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:30 EDT