Re: Seemingly duplicated radicals, reasoning?

From: James Kass (thunder-bird@earthlink.net)
Date: Mon Dec 24 2007 - 16:40:56 CST

  • Next message: Jeroen Ruigrok van der Werven: "Re: Seemingly duplicated radicals, reasoning?"

    Jeroen Ruigrok van der Werven wrote,

    > What I do not understand is why there are radicals in both blocks for at least
    > the following:
    >
    > U+2e95 and U+2f39 - radical snout (two) (a bit dubious one, since the latter
    > seems to have the bottom stroke drawn past the standing stroke)
    > U+2ed1 and U+2fa7 - radical long (one) (no apparent difference)
    > U+2ee3 and U+2fbb - radical bone (no apparent difference)
    > U+2ee4 and U+2fc1 - radical ghost (no apparent difference)
    >
    > Since I am going to use radicals for an application I am developing I want to
    > be sure I am not misunderstanding anything. If the above are indeed not
    > different would I just have to make sure I use the Kangxi radical over the CJK
    > Supplemental one?

    In the MingLiU font, the following pairs don't necessarily look
    identical. This is also true of other fonts, but the differences
    aren't always consistent.

    In MingLiU:
    ⺕ ⼹ - The first is about half height, the second stroke of the
         first glyph extends through its first stroke
    ⻑ ⾧ - The first looks as though it has one fewer stroke than
         the second.
    ⻣ ⾻ - The first, the corner inside the box opens at the left
         rather than the right. In other words, the corner inside
         the box on the first is an upper right corner, it is an upper
         left corner in the second glyph.
    ⻤ ⿁ - The first looks like one fewer stroke, stroke five of the
         first glyph extends to the baseline.

    As you probably know, all of the kangxi radicals are also
    encoded as characters in the ideographic ranges.

    So, 彐 (# 58), 長 (#168), 骨 (#188), and 鬼 (#194) could
    be referred to as U+5F50, U+9577, U+9AA8, and U+9B3C,
    respectively.

    Quoting from T.U.S. 5.0 page 426,
    "Semantics. Characters in the CJK and KangXi Radicals blocks should
    never be used as ideographs. They have different properties and meanings.
    U+2F00 KANGXI RADICAL ONE is not equivalent to U+4E00 CJK UNIFIED
    IDEOGRAPH-4E00, for example. The former is to be treated as a symbol,
    the latter as a word or part of a word.

    "The characters in the CJK and KangXi Radicals blocks are compatibility
    characters. Except in cases where it is necessary to make a semantic
    distinction between a Chinese character in its role as a radical and the
    same Chinese character in its role as an ideograph, the characters from
    the Unified Ideographs blocks should be used instead of the compatibility
    radicals. To emphasize this difference, radicals may be given a distinct
    font style from their ideographic counterparts."

    In the section titled "CJK and KangXi Radicals: U+2E80-U+2FD5" which
    starts on page 425, it is mentioned that CNS 11643-1992 included a
    block of radicals in addition to the ideographs, and that the block
    contained 212 of the 214 radicals. This suggests that the inclusion
    of a separate block for the already encoded radicals (as ideographs)
    was done for reasons of compatibility.

    But, I've never understood the reasoning behind all the duplications.
    Having just read the pertinent section in the Unicode Standard doesn't
    really help much in understanding. Of course, "KANGXI RADICAL ONE"
    is the ideograph encoded at U+4E00. The fact that some standards have
    chosen to encode that radical separately and assign it properties as
    a symbol doesn't alter that reality. The fact that many dictionaries
    use a different font style to display radicals in indices only means
    that it is a different font style, not that there is any difference
    at the character (plain text) level.

    Best regards,

    James Kass



    This archive was generated by hypermail 2.1.5 : Mon Dec 24 2007 - 16:42:49 CST