From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Thu Dec 04 2003 - 08:10:17 EST
Philippe Verdy wrote:
...
> letters each. Fortunately, the definition of Hangul syllable blocks need
> not be changed, as it works well with Hangul syllables as L+, V+, T*
> (where L, V, and T stand for single-letter jamos).
In fact the Unicode encoding of modern Hangul syllables is more
accurately:
(Ls|Lm)+ (Vs|Vm)+ (Ts|Tm)*
where Ls,Vs,Ts are single-letter L,V,T modern jamos
and Lm,Vm,Tm are multiple-letter L,V,T modern jamos
Yes, but that goes beyond what I wanted to say.
If we count also the encoded modern LV and LVT johab syllables:
( ( (Ls|Lm)+ (Vs|Vm)+ )
| ( (Ls|Lm)* (LsVs|LsVm|LmVs|LmVm) (Vs|Vm)* )
| ( (Ls|Lm)* (LsVsTs|LsVmTs|LmVsTs|LmVmTs|
LsVsTm|LsVmTm|LmVsTm|LmVmTm) )
) (Ts|Tm)*
I'm not even going to try to parse that...
The idea is to allow decomposing Lm,Vm, or Tm into sequences of
Ls, Vs, or Ts using supplementary decompositions including for the
compatibility Hangul syllables.
So this will effectively produce syllables encoded only with
Ls+ Vs+ Ts*
That's what I said.
Then to recompose them as much as possible to build Lm,Vm,Tm jamos,
One can do that, yes (but not as part of Unicode normalisation).
and then reassemble them in either jahob syllables (LV or LVT),
Yes. Like for NFC, using the arithmetically specified decompositions for
LV (into <L, V>) and LVT (into <LV, T>, as they are more properly done),
inverted, recursively.
Note that to ensure uniqueness, the non-arithmetically specified jamo
compositions (NOT a part of any Unicode normalisation) for a syllable
must be done fully before any of the arithmetically specified compositions
on that syllable.
or in some compatibility syllables (historic syllables starting
by vowels).
"Compatibility syllables"?
None of the historic syllables start with a vowel letter. YESIEUNG, and
later IEUNG, have always been used as a "silent" lead consonant for
words that in pronunciation start with a vowel. (IEUNG used to mean
"ng" also as a lead consonant, but since(?) no Korean words start with "ng",
and IEUNG looks a lot like YESIEUNG, a leading IEUNG became silent,
and YESIEUNG became obsolete (a silent trail consonant was always
omitted).) The FILLERs are entirely modern inventions, used for computer
representation (in jamos; and some compatibility encodings) mostly for
isolated letters, and partial syllables.
This process seems to match the Korean readers interpretation of
Hangul syllables, and matches the description in the N954.PDF
working document of JTC1/SC22/WG20.
At least it has the merit to allow unification of uncomposed SSANG
consonnants, or uncomposed Y or E vowels that may appear even within
a text using only modern jamos or johad syllables. It also simplifies
the preparation of Hangul texts for UCA.
Yes, it does. But such preparation is not needed for collation, as it can
all be done inside of the collation table.
See http://std.dkuug.dk/jtc1/sc22/wg20/docs/n1051-hangulsort.pdf
(I'm working on an update of that document).
/kent k
This archive was generated by hypermail 2.1.5 : Thu Dec 04 2003 - 09:19:12 EST