From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Sun Dec 21 2003 - 10:27:47 EST
Philippe Verdy wrote:
...
> There are two new files to add in the UCD: one that defines these
> extra "canonical Johab" compositions into Basic Jamos (I would call
> it "HangulBasic.txt")
I will try to catch up with what you have written about Hangul the
last few days, and I'm happy to get some review comments on the
collation tailoring. But give me a few days to catch up (and there is
Christmas holidays coming up too...).
But first, regarding KAPYEOUN-:
The correct letter in the KAPYEOUN- constructs is a IEUNG, which was
initially a silent letter wherever it occurred, and was for some time
used to mark a particular consonant sound variation. YESIEUNG was
initially a "ng" wherever it occurred. They later merged, and the
"modern" IEUNG is silent as an initial consonant, but and "ng" as a
final consonant. Glyphically they merged too. Initially IEUNG was just
a ring, and YESIEUNG a ring with a clear "shoot" on top.
I gave the KAPYEOUN- constructs a special weight since they
(apparently) collate a bit later than what would be the result of just
using the IEUNG weight. A reasonable simplification could be to
just use the IEUNG weight also for the KAPYEOUN- constructs. But
note that that results in a slightly different ordering.
I also have two additional datafiles... One for the pseudo-canonical
decompositions of multiletter jamos and pseudo-compatibility
decompositions for the compatibility Hangul letters, and one for
the special decompositions for KS X 1001 for Hangul compatibility
letters. Note that the pseudo-canonical decompositions I have
are into two jamo characters, since all other non-singleton
canonical decompositions are into two other characters. That
way the recomposition algorithm can be changed minimally.
I'll check your comments, and we could then exchange files for
off-line checking. (But allow a few days for delivery ;-)
/kent k
This archive was generated by hypermail 2.1.5 : Sun Dec 21 2003 - 11:14:04 EST