Technical Reports |
Version | 1 (draft 1) |
Authors | Mark Davis (markdavis@google.com), Jungshik Shin |
Date | 2009-01-27 |
This Version | http://www.unicode.org/reports/tr47/tr47-1.html |
Previous Version | n/a |
Latest Version | http://www.unicode.org/reports/tr47/ |
Revision | 1 |
Korean text can be represented in Unicode in different useful ways, including the customary NFC and NFD forms (see UAX #15, “Unicode Normalization Forms” [UAX15]). This document provides background information on the nature of Korean text, including clarifications of the atomic units of encoding at a very basic level, and defines three additional transformations that can be useful in processing Korean text. It also discusses transformations of compatibility characters.
This is a draft document which may be updated, replaced, or superseded by other documents at any time. Publication does not imply endorsement by the Unicode Consortium. This is not a stable document; it is inappropriate to cite this document as other than a work in progress.
A Unicode Technical Report (UTR) contains informative material. Conformance to the Unicode Standard does not imply conformance to any UTR. Other specifications, however, are free to make normative references to a UTR.
Please submit corrigenda and other comments with the online reporting form [Feedback]. Related information that is useful in understanding this document is found in References. For the latest version of the Unicode Standard see [Unicode]. For a list of current Unicode Technical Reports see [Reports]. For more information about versions of the Unicode Standard, see [Versions].
Korean text can be represented in Unicode in different useful ways, including the customary NFC and NFD forms (see UAX #15, “Unicode Normalization Forms” [UAX15]). This document provides background information on the nature of Korean text, including clarifications of the atomic units of encoding at a very basic level, and defines three additional transformations that can be useful in processing Korean text. It also discusses transformations of compatibility characters.
A maximal decompounding transformation goes further than NFD in reducing Korean text to completely atomic units. A maximal compounding transformation provides the reverse operation, going further than NFC in forming compounds out of the completely atomic units. These are used by a wide variety of software, especially in open-source, and also correspond to certain keyboarding methods. Finally, the transformation favored by the new Korean Standard KS X 1026-1:2007 is described, wherein a syllable is either a single character or a sequence of exactly two or three characters as described below.
These transformations should not be mistaken for Unicode Normalization Forms; in particular, for NFC, the recommended format for interchange. However, it is straightforward to convert to and from NFC as needed.
At its heart, Korean can be represented by a sequence of atomic jamo characters. The atomic jamo characters are listed in the table below, broken out between modern and archaic.
Abbr. | Property Value Name | Comments |
L |
Leading_Jamo | leading consonants; also called choseong |
V |
Vowel_Jamo | vowels; also called jungseong |
T |
Trailing_Jamo | trailing consonants; also called jongseong |
LV |
LV_Syllable | |
LVT |
LVT_Syllable | |
NA |
Not_Applicable | including some Hangul compatibility characters: see below. |
14 Modern Atomic L Characters [ᄀᄂᄃᄅ-ᄇᄉᄋᄌᄎ-ᄒ] |
10 Archaic Atomic L Characters [ᄼᄾᅀᅌᅎᅐᅔᅕᅙ] |
U+1100 ( ᄀ ) HANGUL CHOSEONG KIYEOKU+1102 ( ᄂ ) HANGUL CHOSEONG NIEUNU+1103 ( ᄃ ) HANGUL CHOSEONG TIKEUTU+1105 ( ᄅ ) HANGUL CHOSEONG RIEULU+1106 ( ᄆ ) HANGUL CHOSEONG MIEUMU+1107 ( ᄇ ) HANGUL CHOSEONG PIEUPU+1109 ( ᄉ ) HANGUL CHOSEONG SIOSU+110B ( ᄋ ) HANGUL CHOSEONG IEUNGU+110C ( ᄌ ) HANGUL CHOSEONG CIEUCU+110E ( ᄎ ) HANGUL CHOSEONG CHIEUCHU+110F ( ᄏ ) HANGUL CHOSEONG KHIEUKHU+1110 ( ᄐ ) HANGUL CHOSEONG THIEUTHU+1111 ( ᄑ ) HANGUL CHOSEONG PHIEUPHU+1112 ( ᄒ ) HANGUL CHOSEONG HIEUH |
U+1140 ( ᅀ ) HANGUL CHOSEONG PANSIOSU+114C ( ᅌ ) HANGUL CHOSEONG YESIEUNG ( ᅙ ) HANGUL CHOSEONG YEORINHIEUH(for Chinese phonetics) U+113C ( ᄼ ) HANGUL CHOSEONG CHITUEUMSIOSU+113E ( ᄾ ) HANGUL CHOSEONG CEONGCHIEUMSIOSU+114E ( ᅎ ) HANGUL CHOSEONG CHITUEUMCIEUCU+1150 ( ᅐ ) HANGUL CHOSEONG CEONGCHIEUMCIEUCU+1154 ( ᅔ ) HANGUL CHOSEONG CHITUEUMCHIEUCHU+1155 ( ᅕ ) HANGUL CHOSEONG CEONGCHIEUMCHIEUCH |
10 Modern Atomic V Characters [ᅡᅣᅥᅧᅩᅭᅮᅲᅳᅵ] |
1 Archaic Atomic V Characters [ᆞ] |
U+1161 ( ᅡ ) HANGUL JUNGSEONG AU+1163 ( ᅣ ) HANGUL JUNGSEONG YAU+1165 ( ᅥ ) HANGUL JUNGSEONG EOU+1167 ( ᅧ ) HANGUL JUNGSEONG YEOU+1169 ( ᅩ ) HANGUL JUNGSEONG OU+116D ( ᅭ ) HANGUL JUNGSEONG YOU+116E ( ᅮ ) HANGUL JUNGSEONG UU+1172 ( ᅲ ) HANGUL JUNGSEONG YUU+1173 ( ᅳ ) HANGUL JUNGSEONG EUU+1175 ( ᅵ ) HANGUL JUNGSEONG I |
U+119E ( ᆞ ) HANGUL JUNGSEONG ARAEA |
14 Modern Atomic T Characters [ᆨᆫᆮᆯᆷᆸᆺᆼ-ᇂ] |
3 Archaic Atomic T Characters [ᇫᇰᇹ] |
U+11A8 ( ᆨ ) HANGUL JONGSEONG KIYEOKU+11AB ( ᆫ ) HANGUL JONGSEONG NIEUNU+11AE ( ᆮ ) HANGUL JONGSEONG TIKEUTU+11AF ( ᆯ ) HANGUL JONGSEONG RIEULU+11B7 ( ᆷ ) HANGUL JONGSEONG MIEUMU+11B8 ( ᆸ ) HANGUL JONGSEONG PIEUPU+11BA ( ᆺ ) HANGUL JONGSEONG SIOSU+11BC ( ᆼ ) HANGUL JONGSEONG IEUNGU+11BD ( ᆽ ) HANGUL JONGSEONG CIEUCU+11BE ( ᆾ ) HANGUL JONGSEONG CHIEUCHU+11BF ( ᆿ ) HANGUL JONGSEONG KHIEUKHU+11C0 ( ᇀ ) HANGUL JONGSEONG THIEUTHU+11C1 ( ᇁ ) HANGUL JONGSEONG PHIEUPHU+11C2 ( ᇂ ) HANGUL JONGSEONG HIEUH |
U+11EB ( ᇫ ) HANGUL JONGSEONG PANSIOSU+11F0 ( ᇰ ) HANGUL JONGSEONG YESIEUNGU+11F9 ( ᇹ ) HANGUL JONGSEONG YEORINHIEUH |
Many of the jamo characters are compound, combining two or more characters together. These cases only include "like" characters, such as two L characters or two V characters. These compound characters are, for historic reasons, neither compatibility nor canonical variants of their decompositions. The compound jamo characters are listed in the table below, broken out between modern and archaic. Because the archaic characters are so numerous, they are linked to rather than listed explicitly.
Note: even though the column on the right is titled Archaic, some of these characters may be used -- but just not part of Compound Syllables (see below). For more information, see http://i18nl10n.com/korean/uyeo.html.
[Ed note: the Unicode 5.2 characters need to be added to these lists.]
5 Modern Compound L Characters [ᄁᄄᄈᄊᄍ] |
62 Archaic Compound L Characters |
U+1101 ( ᄁ ) HANGUL CHOSEONG SSANGKIYEOKU+1104 ( ᄄ ) HANGUL CHOSEONG SSANGTIKEUTU+1108 ( ᄈ ) HANGUL CHOSEONG SSANGPIEUPU+110A ( ᄊ ) HANGUL CHOSEONG SSANGSIOSU+110D ( ᄍ ) HANGUL CHOSEONG SSANGCIEUC |
[ᄓ-ᄻ ᄽ ᄿ ᅁ-ᅋ ᅍ ᅏ ᅑ-ᅓ ᅖ-ᅘ] |
11 Modern Compound V Characters [ᅢᅤᅦᅨᅪ-ᅬᅯ-ᅱᅴ] |
44 Archaic Compound V Characters |
U+1162 ( ᅢ ) HANGUL JUNGSEONG AEU+1164 ( ᅤ ) HANGUL JUNGSEONG YAEU+1166 ( ᅦ ) HANGUL JUNGSEONG EU+1168 ( ᅨ ) HANGUL JUNGSEONG YEU+116A ( ᅪ ) HANGUL JUNGSEONG WAU+116B ( ᅫ ) HANGUL JUNGSEONG WAEU+116C ( ᅬ ) HANGUL JUNGSEONG OEU+116F ( ᅯ ) HANGUL JUNGSEONG WEOU+1170 ( ᅰ ) HANGUL JUNGSEONG WEU+1171 ( ᅱ ) HANGUL JUNGSEONG WIU+1174 ( ᅴ ) HANGUL JUNGSEONG YI |
[ᅶ-ᆝ ᆟ-ᆢ] |
13 Modern Compound T Characters [ᆩᆪᆬᆭᆰ-ᆶᆹᆻ] |
52 Archaic Compound T Characters |
U+11A9 ( ᆩ ) HANGUL JONGSEONG SSANGKIYEOKU+11AA ( ᆪ ) HANGUL JONGSEONG KIYEOK-SIOSU+11AC ( ᆬ ) HANGUL JONGSEONG NIEUN-CIEUCU+11AD ( ᆭ ) HANGUL JONGSEONG NIEUN-HIEUHU+11B0 ( ᆰ ) HANGUL JONGSEONG RIEUL-KIYEOKU+11B1 ( ᆱ ) HANGUL JONGSEONG RIEUL-MIEUMU+11B2 ( ᆲ ) HANGUL JONGSEONG RIEUL-PIEUPU+11B3 ( ᆳ ) HANGUL JONGSEONG RIEUL-SIOSU+11B4 ( ᆴ ) HANGUL JONGSEONG RIEUL-THIEUTHU+11B5 ( ᆵ ) HANGUL JONGSEONG RIEUL-PHIEUPHU+11B6 ( ᆶ ) HANGUL JONGSEONG RIEUL-HIEUHU+11B9 ( ᆹ ) HANGUL JONGSEONG PIEUP-SIOSU+11BB ( ᆻ ) HANGUL JONGSEONG SSANGSIOS
|
[ᇃ-ᇪ ᇬ-ᇯ ᇱ-ᇸ] |
For modern Korean, all 11,172 combinations of modern L, V, or T characters that form syllables are present in Unicode in compound forms, from AC00 to D7A3.
U+AC00
( 가 ) HANGUL SYLLABLE GAU+AC01
( 각 ) HANGUL SYLLABLE GAGU+AC02
( 갂 ) HANGUL SYLLABLE GAGGU+AC03
( 갃 ) HANGUL SYLLABLE GAGSU+D7A0
( 힠 ) HANGUL SYLLABLE HIKU+D7A1
( 힡 ) HANGUL SYLLABLE HITU+D7A2
( 힢 ) HANGUL SYLLABLE HIPU+D7A3
( 힣 ) HANGUL SYLLABLE HIHCompound Syllable | Component Compound Jamo | Component Atomic Jamo |
U+AF51 ( 꽑 ) HANGUL SYLLABLE GGWALG
|
U+1101 ( ᄁ ) HANGUL CHOSEONG SSANGKIYEOK,U+116A ( ᅪ ) HANGUL JUNGSEONG WA,U+11B0 ( ᆰ ) HANGUL JONGSEONG RIEUL-KIYEOK |
U+1100 ( ᄀ ) HANGUL CHOSEONG KIYEOK,U+1100 ( ᄀ ) HANGUL CHOSEONG KIYEOK,U+1169 ( ᅩ ) HANGUL JUNGSEONG O,U+1161 ( ᅡ ) HANGUL JUNGSEONG A,U+11AF ( ᆯ ) HANGUL JONGSEONG RIEUL,U+11A8 ( ᆨ ) HANGUL JONGSEONG KIYEOK, |
Unicode supplies several normalization forms, which are described in [Unicode Normalization Forms]. The most common of these, and the one recommended for use on the web, is called NFC: where C stands for Composition. In terms of Korean characters, in this form a string is first put into a uniform decomposed form (NFD), and then for any modern L, V, and T characters, a V joins with a preceding L to form an LV compound syllable, and a T joins with a preceding LV to form an LVT compound syllable.
Examples where NFC composes:
U+1101
( ᄁ ) HANGUL CHOSEONG SSANGKIYEOK,U+116A
( ᅪ ) HANGUL JUNGSEONG WAU+AF48
( 꽈 ) HANGUL SYLLABLE GGWAU+AF48
( 꽈 ) HANGUL SYLLABLE GGWA,U+11B0
( ᆰ ) HANGUL JONGSEONG RIEUL-KIYEOKU+AF51
( 꽑 ) HANGUL SYLLABLE GGWALGNormalization Form C only composes pairwise, in a single pass. It does not compose series of jamo of the same type.
Example where NFC does not compose:
U+AF51
( 꽑 ) HANGUL SYLLABLE GGWALG under NFC:U+1100
( ᄀ ) HANGUL CHOSEONG KIYEOK,U+1100
( ᄀ ) HANGUL CHOSEONG KIYEOK,U+1169
( ᅩ ) HANGUL JUNGSEONG O,U+1161
( ᅡ ) HANGUL JUNGSEONG A,U+11AF
( ᆯ ) HANGUL JONGSEONG RIEUL,U+11A8
( ᆨ ) HANGUL JONGSEONG KIYEOK
Unicode
Normalization Forms are widely used, and have strict stability
requirements that prevent any changes. However, other transformations can be specified for particular
processing purposes.
Note that the visual appearance of NFC and NFD forms should be identical even though the underlying codes may be different.
The Maximal Korean Compounding and Decompounding transformations are parallel to the NFC and NFD transformations, but are based on a finer-grained use of atomic jamo characters. These forms are defined by the data tables in [MKD] and [MKC], using Unicode Locales (CLDR) transform rules. The following are textual descriptions of how they are formed.
Note that once a string is transformed by Maximal Korean Decompounding, it is stable under NFD; that is, any application of NFD to the string will make no further changes. Similarly, once a string is transformed by Maximal Korean Compounding, it is stable under NFC; that is, any application of NFC to the string will make no further changes.
[Ed Note: review the transform rules with regard to IEUNG vs. YESIEUNG.]
Example of Maximal Korean Decompounding:
Action | Text |
Input |
U+AF51 ( 꽑 ) HANGUL SYLLABLE GGWALG |
NFD | U+1101 ( ᄁ ) HANGUL CHOSEONG SSANGKIYEOK,U+116A ( ᅪ ) HANGUL JUNGSEONG WA,U+11B0 ( ᆰ ) HANGUL JONGSEONG RIEUL-KIYEOK |
Add. Rules | U+1100 ( ᄀ ) HANGUL CHOSEONG KIYEOK,U+1100 ( ᄀ ) HANGUL CHOSEONG KIYEOK,U+1169 ( ᅩ ) HANGUL JUNGSEONG O,U+1161 ( ᅡ ) HANGUL JUNGSEONG A,U+11AF ( ᆯ ) HANGUL JONGSEONG RIEUL,U+11A8 ( ᆨ ) HANGUL JONGSEONG KIYEOK, |
Example of Maximal Korean Compounding:
Action | Text |
Input |
U+1100 ( ᄀ ) HANGUL CHOSEONG KIYEOK,U+1100 ( ᄀ ) HANGUL CHOSEONG KIYEOK,U+1169 ( ᅩ ) HANGUL JUNGSEONG O,U+1161 ( ᅡ ) HANGUL JUNGSEONG A,U+11AF ( ᆯ ) HANGUL JONGSEONG RIEUL,U+11A8 ( ᆨ ) HANGUL JONGSEONG KIYEOK |
Add. Rules | U+1101 ( ᄁ ) HANGUL CHOSEONG SSANGKIYEOK,U+116A ( ᅪ ) HANGUL JUNGSEONG WA,U+11B0 ( ᆰ ) HANGUL JONGSEONG RIEUL-KIYEOK |
NFC | U+AF51 ( 꽑 ) HANGUL SYLLABLE GGWALG
|
Note that the visual appearance of NFC, NFD, Maximal Korean Decompounding and Maximal Korean Compounding should be identical even though the underlying codes may be different
The Korean Standard KS X 1026-1:2007 defines a format whereby the intent is for compound syllables to occur only where all of the characters in a syllable can be included, and otherwise syllables are represented using two or three jamo characters. The Common Korean Compounding transformation is intended to match that format, and is defined by the data tables in [CKC] using the Unicode Locales (CLDR) transform rules. The following is a textual description of how it is formed.
The Common Korean Compounding transformation first transforms the input string into NFC, and then applies NFD to exactly the following Compound Syllables in that result:
Importantly, a string transformed by Common Korean Compounding is not stable under NFC; if NFC is applied to the string, it may change. However, this will only happen if the Common Korean Compounding string contains archaic syllables. The transformation back from NFC is simple - just applying the rules given aboves.
[Ed Note: describe how to insert FILLER characters to match KS X 1026 recommendations.]
Example:
The following illustrates the difference between these formats.
[Ed Note: In the following table, add lines between each of the rows to make the following clearer,
and use images for the syllables.
Also break into three tables: one that
shows the normal cases, where NFC is identical to Common Korean Compounding. The second shows cases
where they differ, but no fillers are necessary. And the third shows how fillers
would be inserted to make 'normal' syllable boundaries, as described in KS X
1026.]
NFC | NFC code points | → | Common Korean Compounding | Common Korean Compounding codepoints |
꽑 ᄀ꽑 꽑ᅩ 꽑ᆯ 꽈 ᄀ꽈 꽈ᅩ 꽐 |
U+AF51 U+1100 U+AF51 U+AF51 U+1169 U+AF51 U+11AF U+AF48 U+1100 U+AF48 U+AF48 U+1169 U+AF50 |
꽑 ᄀ꽑 꽑ᅩ 꽑ᆯ 꽈 ᄀ꽈 꽈ᅩ 꽐 |
U+AF51 U+1100 U+1101 U+116A U+11B0 U+AF51 U+1169 U+1101 U+116A U+11B0 U+11AF U+AF48 U+1100 U+1101 U+116A U+1101 U+116A U+1169 U+AF50 |
Note that the visual appearance of NFC, NFD, Maximal Korean Decompounding, Maximal Korean Compounding, and Common Korean Compounding should be identical even though the underlying codes may be different.
There are a number of Korean characters defined for compatibility in Unicode, split among three blocks. Disregarding unassigned characters, these are:
Hangul_Compatibility_JamoMost of these compatibility Korean characters are jamo characters, except for the following four (which are also the only ones to contain the word "KOREAN" instead of "HANGUL"):
U+321D
( ㈝ ) PARENTHESIZED KOREAN CHARACTER OJEONU+321E
( ㈞ ) PARENTHESIZED KOREAN CHARACTER O HUU+327C
( ㉼ ) CIRCLED KOREAN CHARACTER CHAMKOU+327D
( ㉽ ) CIRCLED KOREAN CHARACTER JUEUIThe compatibility jamo characters corresponding to V characters are unproblematic; they are mapped to V characters by the Unicode compatibility normalizations (NFKC, NFKD). However, most other compatibility characters do not distinguish between the L and T forms for the consonants. They are mapped to L forms by the Unicode compatibility normalizations except where they cannot be represented by a modern L characters. In that latter case they are mapped to T characters. Thus the Unicode compatibility normalizations do not normally result in well-formed syllables.
There are two better approaches to take. One is to use the Unicode compatibility normalizations, but insert FILLER characters so as to represent separate elements. This results in a format that is stable under all Unicode normalizations, and represented the compatibility characters as independent elements. However, for many purposes it is useful to transform the Hangul Letters and Halfwidth Hangul Letters into complete syllables, with L, V, and optionally T characters. This is common in keyboarding, for example.
The Maximal Korean Compatibility Compounding transformation is defined by the data tables in [MKKC], using the Unicode Locales transformation rules. The following is a textual descriptions of how it is formed.
The Maximal Korean Compatibility Compounding transformation first converts any Compatibility or Halfwidth Hangul Letter L into a T if it is before an L form, then applies NFKD, then finally applies Maximal Korean Compounding.
Once a string has be transformed by Maximal Korean Compatibility Compounding, it is stable under NFKC; that is, any application of NFKC to the string will make no further changes.
[Ed Note: The transformation rules are just a stub at this point, and need to be filled out.]
Note that the visual appearance of Maximal Korean Compatibility Compounding will be different than its source if the source contained Hangul Compatibility Jamo or Halfwidth Jamo.
The specifications provided above are logical specifications; they can and should be optimized in production software. In particular, the Maximal Korean Compounding form can be generated in a single pass: without first doing a decompounding. Such an optimization is even easier to perform for Korean characters than for NFC, because compounding of Korean characters only takes place between adjacent characters.
This is done by generating a (logical) table for all pairs of character that interact. That table can be used to make a single pass through the text, producing Maximal Korean Compounding as it goes. The pairs that interact fall into two categories:
Merge the pair into a single character. Example:
Replace the pair by a different string. This only happens in degenerate cases. Example:
U+11B1
( ᆱ ) HANGUL JONGSEONG RIEUL-MIEUM, U+1101
( ᄁ ) HANGUL CHOSEONG SSANGKIYEOKU+11D1
( ᇑ ) HANGUL JONGSEONG RIEUL-MIEUM-KIYEOK, U+1100
( ᄀ ) HANGUL CHOSEONG KIYEOKExample of Maximal Korean Compounding using pairwise combination:
Action | Text |
Input | U+1100 ( ᄀ ) HANGUL CHOSEONG KIYEOK,U+1100 ( ᄀ ) HANGUL CHOSEONG KIYEOK,U+1169 ( ᅩ ) HANGUL JUNGSEONG O,U+1161 ( ᅡ ) HANGUL JUNGSEONG A,U+11AF ( ᆯ ) HANGUL JONGSEONG RIEUL,U+11A8 ( ᆨ ) HANGUL JONGSEONG KIYEOK,U+1103 ( ᄃ ) HANGUL CHOSEONG TIKEUT,U+1103 ( ᄃ ) HANGUL CHOSEONG TIKEUT,... |
Combine pair |
→ U+1101 ( ᄁ ) HANGUL CHOSEONG SSANGKIYEOK,U+1169 ( ᅩ ) HANGUL JUNGSEONG O,U+1161 ( ᅡ ) HANGUL JUNGSEONG A,U+11AF ( ᆯ ) HANGUL JONGSEONG RIEUL,U+11A8 ( ᆨ ) HANGUL JONGSEONG KIYEOK,U+1103 ( ᄃ ) HANGUL CHOSEONG TIKEUT,U+1103 ( ᄃ ) HANGUL CHOSEONG TIKEUT,... |
Combine pair | → U+AF2C ( 꼬 ) HANGUL SYLLABLE GGO,U+1161 ( ᅡ ) HANGUL JUNGSEONG A,U+11AF ( ᆯ ) HANGUL JONGSEONG RIEUL,U+11A8 ( ᆨ ) HANGUL JONGSEONG KIYEOK,U+1103 ( ᄃ ) HANGUL CHOSEONG TIKEUT,U+1103 ( ᄃ ) HANGUL CHOSEONG TIKEUT,... |
Combine pair | → U+AF48 ( 꽈 ) HANGUL SYLLABLE GGWA,U+11AF ( ᆯ ) HANGUL JONGSEONG RIEUL,U+11A8 ( ᆨ ) HANGUL JONGSEONG KIYEOK,U+1103 ( ᄃ ) HANGUL CHOSEONG TIKEUT,U+1103 ( ᄃ ) HANGUL CHOSEONG TIKEUT,... |
Combine pair | →
U+AF50 ( 꽐 ) HANGUL SYLLABLE GGWAL,U+11A8 ( ᆨ ) HANGUL JONGSEONG KIYEOK,U+1103
( ᄃ ) HANGUL CHOSEONG TIKEUT,U+1103
( ᄃ ) HANGUL CHOSEONG TIKEUT,... |
No combination, move to next | U+AF51 ( 꽑 ) HANGUL SYLLABLE GGWALG,U+1103 ( ᄃ ) HANGUL CHOSEONG TIKEUT,U+1103 ( ᄃ ) HANGUL CHOSEONG TIKEUT,... |
Combine pair |
U+AF51 ( 꽑 ) HANGUL SYLLABLE GGWALG,→ U+1104 ( ᄄ ) HANGUL CHOSEONG SSANGTIKEUT... |
Such a table, if it were expressed literally, would be huge: here are the counts for the replacement strings:
Length | Count of instances |
1 |
19,892 |
2 |
88,650 |
3 |
222,510 |
4 |
3,192 |
So in practice such a pairwise table would not be stored literally, but instead, because of the regularities in Korean compounding, would be primarily expressed via small tables and coded algorithms.
[Ed note: Describe the relations between the above forms, and give scenarios as to how they would be transformed back and forth. Note the security issues.]
[Ed note: Add thanks ...]
[CKC] | Common Korean Compounding Data For the latest version, see: http://www.unicode.org/reports/tr47/ckc-1.txt |
[MKC] | Maximal Korean Compounding Data For the latest version, see: http://www.unicode.org/reports/tr47/mkc-1.txt |
[MKD] | Maximal Korean Decompounding Data For the latest version, see: http://www.unicode.org/reports/tr47/mkd-1.txt |
[MKKC] | Maximal Korean Compatibility Compounding Data For the latest version, see: http://www.unicode.org/reports/tr47/mkkc-1.txt |
This section indicates the changes introduced by each revision.
Revision 1
First version
Copyright © 2009 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode Terms of Use apply.
Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.