From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jun 14 2006 - 17:17:11 CDT
Theodore H. Smith continued:
> > Theodore H. Smith wrote:
> >> Does AC00 actually decompose?
> > Yes. See TUS section 3.12 "Conjoining Jamo Behavior", <http://
> > www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G24646>.
>
> But why isn't it listed in UnicodeData.txt?
TUS 4.0, p. 72:
D23 Canonical decomposition: The decomposition of a character
that results from recursively applying the canonical mappings
found in the names list of Section 16.1, Character Names List,
and those described in Section 3.12, Conjoining Jamo Behavior,
^^^
until no characters can be further decomposed, and then
reordering nonspacing marks according to Section 3.11,
Canonical Ordering Behavior.
TUS 4.0, p. 418:
A character names list is not provided for characters in
the Hangul Syllables block, U+AC00..U+D7AF, because the
name of a Hangul syllable can be determined by algorithm
as described in Section 3.12, Conjoining Jamo Behavior.
UnicodeData.txt:
AC00;<Hangul Syllable, First>;Lo;0;L;;;;;N;;;;;
D7A3;<Hangul Syllable, Last>;Lo;0;L;;;;;N;;;;;
Those entries indicate the beginning and end range of
the Hangul syllables, rather than listing 11,172 Hangul
syllables, all of which have names and decompositions derivable
by algorithm. (see above)
--Ken
This archive was generated by hypermail 2.1.5 : Wed Jun 14 2006 - 17:25:08 CDT