L2/04-392

Contact: Andrew C West
Subject:
Unicode 1 Hangul Mapping Errors
Date/Time: Thu Oct 21 06:46:12 CST 2004

There appear to be some mistakes in the mapping between Unicode 1 and Unicode 2.0 codepoints for Hangul syllables in the mapping table HANGUL.TXT . I append my analysis of the mapping errors below.

1. Suspect rows in HANGUL.TXT (Col 5 = Unicode 1.0, Col 6 = Unicode 1.1, Col 7 = Unicode 2.0) :

1747	B4D3	B4D3	93AB	-	48B3	B2D2	NIEUN I RIEULMIEUM
1750	-	8899	93AE	-	48B4	B2D5	NIEUN I RIEULTHIEUTH
1751	-	889A	93AF	35AA	35AA	B2D6	NIEUN I RIEULPHIEUPH
5284	BBE5	BBE5	ABB5	384E	384E	C0A3	SSANGPIEUP I SIOS
5285	-	98A4	ABB6	-	40BC	C0A4	SSANGPIEUP I SSANGSIOS
9067	-	B19E	C5B8	-	436C	CF6A	KHIEUKH O CIEUC
9068	-	B19F	C5B9	-	-	CF6B	KHIEUKH O CHIEUCH
9737	-	B881	CA56	-	-	D208	THIEUTH OE SSANGSIOS
9877	-	BA46	CAF6	-	43DA	D294	THIEUTH WI SSANGSIOS
9888	-	BA4F	CB44	-	43FB	D29F	THIEUTH YU KIYEOKSIOS
9916	-	BA6B	CB64	-	-	D2BB	THIEUTH EU KIYEOKSIOS

2. Decomposition of Unicode 2.0 characters in HANGUL.TXT :

Row 1747 : B2D2 => <1102 1175 11B1> [NIEUN, I, RIEUL-MIEUM]
Row 1750 : B2D5 => <1102 1175 11B4> [NIEUN, I, RIEUL-THIEUTH]
Row 1751 : B2D6 => <1102 1175 11B5> [NIEUN, I, RIEUL-PHIEUPH]
Row 5284 : C0A3 => <1108 1175 11BA> [SSANGPIEUP, I, SIOS]
Row 5285 : C0A4 => <1108 1175 11BB> [SSANGPIEUP, I, SSANGSIOS]
Row 9067 : CF6A => <110F 1169 11BD> [KHIEUKH, O, CIEUC]
Row 9068 : D294 => <1110 1171 11BB> [THIEUTH, WI, SSANGSIOS]
Row 9737 : D29F => <1110 1172 11AA> [THIEUTH, YU, KIYEOK-SIOS]
Row 9877 : CF6B => <110F 1169 11BE> [KHIEUKH, O, CHIEUCH]
Row 9888 : D208 => <1110 116C 11BB> [THIEUTH, OE, SSANGSIOS]
Row 9916 : D2BB => <1110 1173 11AA> [THIEUTH, EU, KIYEOK-SIOS]

Comments: A. The short names of the decomposed sequences match the syllable name in HANGUL.TXT, suggesting that the Unicode 2.0 character is the correct one for its row.

3. Corresponding lines from UnicodeData-1.1.5.txt for Unicode 1.0/1.1 characters :

Row 1747 = 48B3;HANGUL SYLLABLE NIEUN I RIEUL-THIEUTH;Lo;0;L;1102 1175 11B4;;;;N;;;;;
Row 1750 = 48B4;HANGUL SYLLABLE NIEUN I RIEUL-PHIEUPH;Lo;0;L;1102 1175 11B5;;;;N;;;;;
Row 1751 = 35AA;HANGUL SYLLABLE NIEUN I RIEUL-MIEUM;Lo;0;L;1102 1175 11B1;;;;N;;;;;
Row 5284 = 384E;HANGUL SYLLABLE SSANGPIEUP I SSANGSIOS;Lo;0;L;1108 1175 11BB;;;;N;;;;;
Row 5285 = 40BC;HANGUL SYLLABLE SSANGPIEUP I SIOS;Lo;0;L;1108 1175 11BA;;;;N;;;;;
Row 9067 = 436C;HANGUL SYLLABLE KHIEUKH O CHIEUCH;Lo;0;L;110F 1169 11BE;;;;N;;;;;
Row 9068 = N/A
Row 9737 = N/A
Row 9877 = 43DA;HANGUL SYLLABLE THIEUTH OE SSANGSIOS;Lo;0;L;1110 116C 11BB;;;;N;;;;;
Row 9888 = 43FB;HANGUL SYLLABLE THIEUTH EU KIYEOK-SIOS;Lo;0;L;1110 1173 11AA;;;;N;;;;;
Row 9916 = N/A

Comments:
A. The decomposition sequences for the Unicode 1.1 characters in UnicodeData-1.1.5.txt are different to the decompositions for the Unicode 2.0 character mapped to the corresponding Unicode 1.1 character in HANGUL.TXT.
B. There are no Unicode 1.1 characters in UnicodeData-1.1.5.txt with the decompositions for C46A <110F 1169 11BD>, D294 <1110 1171 11BB>, and D29F <1110 1172 11AA> given in HANGUL.TXT.

4. Composition of the decomposition sequences for 1.1 characters in UnicodeData-1.1.5.txt :

48B3 [HANGUL SYLLABLE NIEUN I RIEUL-THIEUTH] = <1102 1175 11B4> [NIEUN, I, RIEUL-THIEUTH] => U+B2D5 [HANGUL SYLLABLE NILT]
48B4 [HANGUL SYLLABLE NIEUN I RIEUL-PHIEUPH] = <1102 1175 11B5> [NIEUN, I, RIEUL-PHIEUPH] => U+B2D6 [HANGUL SYLLABLE NILP]
35AA [HANGUL SYLLABLE NIEUN I RIEUL-MIEUM] = <1102 1175 11B1> [NIEUN, I, RIEUL-MIEUM] => U+B2D2 [HANGUL SYLLABLE NILM]
384E [HANGUL SYLLABLE SSANGPIEUP I SSANGSIOS] = <1108 1175 11BB> [SSANGPIEUP, I, SSANGSIOS] => U+C0A4 [HANGUL SYLLABLE BBISS]
40BC [HANGUL SYLLABLE SSANGPIEUP I SIOS] = <1108 1175 11BA> [SSANGPIEUP, I, SIOS] => U+C0A3 [HANGUL SYLLABLE BBIS]
436C [HANGUL SYLLABLE KHIEUKH O CHIEUCH] = <110F 1169 11BE> [KHIEUKH, O, CHIEUCH] => U+CF6B [HANGUL SYLLABLE KOC]
43DA [HANGUL SYLLABLE THIEUTH OE SSANGSIOS] = <1110 116C 11BB> [THIEUTH, OE, SSANGSIOS] => U+D208 [HANGUL SYLLABLE TOESS]
43FB [HANGUL SYLLABLE THIEUTH EU KIYEOK-SIOS] = <1110 1173 11AA> [THIEUTH, EU, KIYEOK-SIOS] => U+D2BB [HANGUL SYLLABLE TEUGS]

Comments:
A. The short names of the decomposed sequences match the syllable names, suggesting that the decomposition given in UnicodeData-1.1.5.txt is correct.
B. The composed syllables (Unicode 2.0 characters) derived from the 1.1.5 decomposition mapping do not match the Unicode 2.0 character mapped to the corresponding Unicode 1.0/1.1 character in HANGUL.TXT.

5. Conclusion
A. The decomposition mappings in UnicodeData-1.1.5.txt are correct.
B. The mappings given for the Unicode 1.0/1.1 characters in rows 1747, 1750, 1751, 5284, 5285, 9067, 9068, 9737, 9877, 9888 and 9916 of HANGUL.TXT are incorrect:

Row 1747 Cols.5 & 6 : "- 48B3" should be "35AA 35AA" (from row 1751)
Row 1750 Cols.5 & 6 : "- 48B4" should be "- 48B3" (from row 1747)
Row 1751 Cols.5 & 6 : "- 35AA" should be "- 48B4" (from row 1750)
Row 5284 Cols.5 & 6 : "- 384E" should be "- 40BC" (from row 5285)
Row 5285 Cols.5 & 6 : "- 40BC" should be "384E 384E" (from row 5284)
Row 9067 Cols.5 & 6 : "- 436C" should be "- -" (from row 9068)
Row 9068 Cols.5 & 6 : "- -" should be "- 436C" (from row 9067)
Row 9737 Cols.5 & 6 : "- -" should be "- 43DA" (from row 9877)
Row 9877 Cols.5 & 6 : "- 43DA" should be "- -" (from row 9737)
Row 9888 Cols.5 & 6 : "- 43FB" should be "- -" (from row 9916)
Row 9916 Cols.5 & 6 : "- -" should be  "- 43FB" (from row 9888)

Consequently these rows should be amended to :

1747	B4D3	B4D3	93AB	35AA	35AA	B2D2	NIEUN I RIEULMIEUM
1750	-	8899	93AE	-	48B3	B2D5	NIEUN I RIEULTHIEUTH
1751	-	889A	93AF	-	48B4	B2D6	NIEUN I RIEULPHIEUPH
5284	BBE5	BBE5	ABB5	-	40BC	C0A3	SSANGPIEUP I SIOS
5285	-	98A4	ABB6	384E	384E	C0A4	SSANGPIEUP I SSANGSIOS
9067	-	B19E	C5B8	-	-	CF6A	KHIEUKH O CIEUC
9068	-	B19F	C5B9	-	436C	CF6B	KHIEUKH O CHIEUCH
9737	-	B881	CA56	-	43DA	D208	THIEUTH OE SSANGSIOS
9877	-	BA46	CAF6	-	-	D294	THIEUTH WI SSANGSIOS
9888	-	BA4F	CB44	-	-	D29F	THIEUTH YU KIYEOKSIOS
9916	-	BA6B	CB64	-	43FB	D2BB	THIEUTH EU KIYEOKSIOS