From: Theo Veenker (Theo.Veenker@let.uu.nl)
Date: Tue Aug 03 2004 - 06:47:36 CDT
Don't know if this has been asked/reported before, but is the example code
for hangul composition in UAX 15 correct?
The code is:
public static String composeHangul(String source) {
int len = source.length();
if (len == 0) return "";
StringBuffer result = new StringBuffer();
char last = source.charAt(0); // copy first char
result.append(last);
for (int i = 1; i < len; ++i) {
char ch = source.charAt(i);
// 1. check to see if two current characters are L and V
int LIndex = last - LBase;
if (0 <= LIndex && LIndex < LCount) {
int VIndex = ch - VBase;
if (0 <= VIndex && VIndex < VCount) {
// make syllable of form LV
last = (char)(SBase + (LIndex * VCount + VIndex) * TCount);
result.setCharAt(result.length()-1, last); // reset last
continue; // discard ch
}
}
// 2. check to see if two current characters are LV and T
int SIndex = last - SBase;
if (0 <= SIndex && SIndex < SCount && (SIndex % TCount) == 0) {
int TIndex = ch - TBase;
if (0 <= TIndex && TIndex <= TCount) {
// make syllable of form LVT
last += TIndex;
result.setCharAt(result.length()-1, last); // reset last
continue; // discard ch
}
}
// if neither case was true, just add the character
last = ch;
result.append(ch);
}
return result.toString();
}
Suppose I feed it 0xAC00 0x11C3. 0xAC00 is an LV.
This will do step 2:
SIndex = 0xAC00 - 0xAC00 = 0
TIndex = 0x11C3 - 0x11A7 = 28
Which causes the "(0 <= TIndex && TIndex <= TCount)" to be true.
And the resulting output is 0xAC00 + 28 = 0xAC1C which is not
an LVT but an LV syllable!
The TIndex <= TCount should be TIndex < TCount I think. IMO the
example would be more clear if the Hangul_Syllable_Type property
would be used.
A somewhat related question. I know next to nothing about Hangul
[de]composition so forgive me for asking silly questions. In the
UnicodeData.txt file there are much more than the 19 L, 21 V, and
28 L jamos. Are the other jamos not use to compose syllables, or
does the syllable block represent an incomplete set of compatibility
characters? What's is it?
Theo
This archive was generated by hypermail 2.1.5 : Tue Aug 03 2004 - 06:49:11 CDT