From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Thu May 15 2003 - 05:43:31 EDT
John Jenkins wrote:
> There is a kTotalStrokes field in Unihan.txt, although it
> doesn't cover every character in Unihan. This would
> definitely be a good place to start.
Not so good. What Gary needs is the *sequence* of all strokes composing each
character. Once he has that data, the total number of strokes from each
character is simply the length of each sequence.
A better starting point would be a database of IDS decompositions of CJK
ideographs. E.g.:
(DB#1: IDS decompositions)
喻 = ⿰ 口 ⿱ ⿱ 人 一 ⿰ 月 刂
U+55BB = LeftRight(MOUTH, TopBottom(TopBottom(MAN, ONE),
LeftRight(MOON, KNIFE))
Once you have that, building a strokes database is quite trivial. First, all
the IDS operators are useless for this purpose and should be stripped off:
(DB#2: Decompositions in atomic components)
喻 = 口 人 一 月 刂
U+55BB = { MOUTH, MAN, ONE, MOON, KNIFE }
Then, a database of strokes for all the atomic components is needed. This
should not such a huge work, because only a few hundreds such components are
supposed to exist:
(DB#3: Stroke sequences of atomic components)
口 = 丨 乙 一
MOUTH = { shu, zhe, heng }
人 = 丿 丶
MAN = { pie, na }
一 = 一
ONE = { heng }
月 = 丿 乙 一 一
MOON = { pie, zhe, heng, heng }
刂 = 丨 亅
KNIFE = { shu, shugou }
At this point, it is easy to automatically expand the components of DB#2 to
the corresponding stroke sequences of DB#3:
(DB#4: CJK stroke sequences)
喻 = 丨 乙 一 丿 丶 一 丿 乙 一 一 丨 亅
U+55BB = { shu, zhe, heng, pie, na, heng, pie, zhe, heng, heng,
shu, shugou }
DB#1 would be useful for a number of purposes, but building it is a pain in
the neck! (Just to be 100% clear, I'd like having it, but I am *not*
volunteering to do it. :-)
_ Marco
This archive was generated by hypermail 2.1.5 : Thu May 15 2003 - 06:24:11 EDT