From: Allen Haaheim (haaheima@interchange.ubc.ca)
Date: Thu May 15 2003 - 18:36:49 EDT
Andrew wrote:
>In dictionaries that give a Stroke Order index, strokes are usually
sub-sorted by the stroke category of the first one or two strokes of the
character.
In indexes ordered by stroke count, the sub-sort is more often by radical
than first stroke(s). The only dictionary I have at home that sub-sorts by
first stroke(s) is _Cihai_.
Marco wrote:
>So, considering only the first two strokes of each >character, would result
in big groups of characters being sorted randomly
These groups need not be random. The 1989 _Cihai_'s "First-two-strokes"
index is sub-sorted by radical, ending up with sub-groupings of only five or
ten characters on average, so that these resultant groups are actually more
tightly organized (though not made explicit with headings) than
radical/stroke tables. Even stopping at the groups of characters listed
under their first-two-stroke headings yields groups of characters no larger
than the groups of truly randomly-ordered characters of a radical/stroke
index. For example, compare 1989 _Cihai's_ first-two-strokes/radical table
to the radical/stroke table in a dictionary of comparable size, _Hanyu da
zidian_.
This being said, I am not doubting that radical/stroke is (for the
initiated) the fastest, most convenient, most commonly found and most
commonly used method, whereas stroke/radical (not stroke alone) is used as
the next alternative when radical/stroke fails to yield the character
(usually when the radical is unclear or guessed wrong).
Regards,
Allen Haaheim
----- Original Message -----
From: "Marco Cimarosti" <marco.cimarosti@essetre.it>
To: "'Andrew C. West'" <andrewcwest@alumni.princeton.edu>;
<unicode@unicode.org>
Sent: Thursday, May 15, 2003 4:54 AM
Subject: RE: how to sort by stroke (not radical/stroke)
> Andrew C. West wrote:
> > [...]
> > I'm not sure that's what he wants either. In dictionaries
> > that give a Stroke Order index, strokes are usually
> > sub-sorted by the stroke category of the first
> > one or two strokes of the character.
>
> I doubt that this would be sufficient in all cases. The radical are often
> the first (left, top) component of a character, and most radicals have
many
> more than two strokes. So, considering only the first two strokes of each
> character, would result in big groups of characters being sorted randomly
> (i.e., all those character whose radical is bigger than two strokes and
> whose residual stroke count is the same).
>
> > [...]
> > Coincidentally I've recently been in contact with someone who
> > has spent the last ten years creating a database of CJK
> > ideographs,
> > [...]
> > first two strokes of each character). The main problem with
> > ideographic decompositions is that not all discrete
> > ideographic components are [currently] encoded within Unicode
> > - there are about 100 unencoded ideographic components
> > according to this person.
>
> Does this also include the (relatively new) "CJK components" block? If
yes,
> it might be worth filling in a proposal to add those components, in order
to
> complete the IDS sub-system.
>
> _ Marco
>
>
This archive was generated by hypermail 2.1.5 : Thu May 15 2003 - 19:10:44 EDT