From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Wed Jun 14 2006 - 18:14:59 CDT
> > But why isn't it listed in UnicodeData.txt?
> >
> Because that would double the size of the file, and because the
> decompositions are algorithmic and are most often implemented
More specifically: they are *arithmetic* even.
> that way (rather than being driven by tables).
But the COULD have been listed in the datafile HangulSyllabeType.txt
without adding a single line to that datafile (just lengthening most
of the lines a little bit)...
AC00; LV; 1100 1162 # Lo HANGUL SYLLABLE GA
...
AC01..AC1B; LVT; AC00 11A8..AC00 11C2 # Lo [27] HANGUL SYLLABLE GAG..HANGUL SYLLABLE GAH
...
Like for all canonical decompositions, there is a maximum of two characters
in the decomposition *mapping*. This is of importance for the *composition*
step in normalisation.
Note that the true letters (in origin) of Hangul are the Hangul Jamos that
consist of only one "element". The two and three "element" Jamos are
composites (though they don't have any formal decomposition in Unicode),
though some of the double ones may now be considered as letters.
(The "HANGUL LETTER"s is another story, which I will not expand on now,
other than that their formal compatibility decompositions aren't useful
in any context.)
Note also that the HANGUL SYLLABLEs, though many, do NOT include
ALL (historic) Hangul syllables.
/kent k
This archive was generated by hypermail 2.1.5 : Wed Jun 14 2006 - 18:24:15 CDT