Re: [unicode] U-Source ideographs mapped to themselves

From: mpsuzuki@hiroshima-u.ac.jp
Date: Sun Aug 29 2010 - 08:17:00 CDT

  • Next message: John H. Jenkins: "Re: U-Source ideographs mapped to themselves"

    On Sun, 29 Aug 2010 14:07:35 +0200
    Uriah Eisenstein <uriaheisenstein@gmail.com> wrote:
    >UAX #38 (Unihan) defines the kIRG_USource field as a reference into the
    >U-source ideograph database described in UTR #45, having the form "UTC
    >nnnnn". However, several CJK Compatibility Ideographs are mapped to their
    >own code point values, e.g. "U+FA0C kIRG_USource U+FA0C". The formal
    >syntax of kIRG_USource allows this, but I've found no explanation as to the
    >meaning of such a mapping; there is also no such mapping from a code point
    >to another code point.

    I think it's good pointing out. U+FA0C was originally
    introduced for the round trip conversion with ISO/IEC
    10646 versus Big5, but it's slightly difficult to know
    such background from the properties in current Unihan.txt.

    U+FA0C is still easier example to understand, because
    its kDefinition mentions about it. U+FA0D is also
    introduced for the compatibility with Big5, but does
    not say such.

    Recently, CJK compatibility ideographs are proposed to
    assign the codepoints for the "characters" whose shape
    differences are unifiable with existing characters. And
    U+F900 - U+FA0B for KS X 1001:1998 compatibility and
    U+FA0C - U+FA0D for Big5 compatibility are exceptional
    because their glyph shapes have exactly no differences
    with existing characters. Some people expect such info.

    For compatibility characters with subtle differences
    in their shapes, I'm not sure if the historical back
    ground is needed /or not. The compatibility ideographs
    introduced for IBM Kanji for Japanese markets have
    subtle differences with the exemplification glyphs in
    Japanese industrial standards when IBM developed them.
    But, in later, newer Japanese industrial standards
    recognized that some of them are reasonable to be coded
    at different code points. Therefore, Unihan.txt lists
    such properties:

    U+FA0F kIBMJapan FA9B
    U+FA0F kIRG_JSource 3-2F4B
    U+FA0F kIRG_USource U+FA0F
    U+FA0F kJIS0213 1,15,43
    U+FA0F kRSAdobe_Japan1_6 C+8421+32.3.7 C+8421+150.7.3

    I'm not sure if all possible variants for JIS X 0213 can
    be recognized with "compatible with IBMJapan".

    # I slipped to check who provided the font to print the
    # characters introduced for IBM Kanji in ISO/IEC 10646.

    Uriah, do you think historical background info about each
    compatibility ideographs should be noted in Unihan.txt?

    Regards,
    mpsuzuki



    This archive was generated by hypermail 2.1.5 : Sun Aug 29 2010 - 08:22:35 CDT