From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Dec 21 2003 - 18:03:06 EST
Kent Karlsson wrote:
> Philippe Verdy wrote:
> ...
> > Here is what I have (this is just the part related to Hangul
> > jamos in the Johab set), presented in collation order:
> > # add canonical de/recomposition of "Johab" compound leading
> consonnant jamos in Hangul
> > # (there are 17 basic consonnants) in Hangul, IEUNG is used for
> KAPYEOUN-
> > #1100;HANGUL CHOSEONG KIYEOK;Lo;0;L;;;;;N;;G;;;
> > 1101;HANGUL CHOSEONG SSANGKIYEOK;Lo;0;L;<johab> 1100 1100;;;;N;;GG;;;
> ...
>
> When possible, I've preferred the "left associative" reading, just
> to make it easier for the recomposition. I don't thing there is any
> linguistic reason for prefering the "right associative" reading for
> any of these. The current interpretation for doubled consonants is
> a modern one; I think the historic reading is different (but not
> quite sure exactly how).
Here also I have no good hint on which association is prefered,
except the normative name. Of course this is just an intermediate
decomposition, and it is expandable before actual use. (In fact
there are cases where this expansion directly to three letters
is already needed because there is no corresponding pair, notably
if we have to map some compatibility clusters to johab clusters,
and so this view is just to simplify the edition of rules.)
>
> There are also some direct errors in your mappings (detailed below).
>
> 111B;HANGUL CHOSEONG KAPYEOUNRIEUL;Lo;0;L;<johab> 1105 114C;;;;N;;RQ;;;
> 111D;HANGUL CHOSEONG KAPYEOUNMIEUM;Lo;0;L;<johab> 1106 114C;;;;N;;MQ;;;
> 112C;HANGUL CHOSEONG KAPYEOUNSSANGPIEUP;Lo;0;L;<johab> 1108
> 114C;;;;N;;BBQ;;;
> 112B;HANGUL CHOSEONG KAPYEOUNPIEUP;Lo;0;L;<johab> 1107 114C;;;;N;;BQ;;;
> -----PLAIN WRONG, yesieung used instead of ieung
Thanks for pointing these 3 errors. I did not see them despite rereading
the file so many times, and checking in the generated trace file which
displays actual characters and not just code points.
> 11F4;HANGUL JONGSEONG KAPYEOUNPHIEUPH;Lo;0;L;<johab> 11C1
> 11E6;;;;N;;pq;;;
> ------PLAIN WRONG, 11E6 instead of 11BC
This one is an obvious copy/paste error when creating rules.
For the other two alts, I'll look to make them coherent with the
left-associative rule used generally in canonical decompositions:
> 1122;HANGUL CHOSEONG PIEUP-SIOS-KIYEOK;Lo;0;L;<johab> 1107
> 112D;;;;N;;BSG;;;
> --- one of two alts, 1121 1100 preferable
For example this rule should effectively a simple extension of
the rule in the previous line related to 1121. But thanks these
are not errors by themselves. I still have many tests to do
with them, by comparing the results from various plain-text
search operations that should find or exclude matches.
Also, the file I gave you was the last I had verified, and I
have another version that includes more characters (notably
the <narrow> decompositions.
In fact, it is your your initial comment N1051 document and
that gave me the idea to reorder the rules in collation order for
the Hangul script (before that it was in code point order, and
it was even more difficult to edit and verify manually). I have
just adapted my parser to use a sorted map (a TreeMap in Java)
instead of a Vector, just to generate a sorted list on output.
Thanks a lot.
Philippe.
(Oh! your message came to the list, despite I gave you my file
in private with the authorization to copy it, so I suppose I
can reply publicly here to this one, no? If this was an error,
admit that it's sometimes difficult to reply to the right
place when there's no instruction and the initial thread was
public...) ;-)
__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com
This archive was generated by hypermail 2.1.5 : Sun Dec 21 2003 - 18:45:23 EST