Re: Are Unihan variant relations expected to be symmetrical?

From: John H. Jenkins (jenkins@apple.com)
Date: Tue Jun 29 2010 - 14:36:30 CDT

  • Next message: António MARTINS-Tuválkin: "Re: charset parameter in Google Groups"

    The kZVariant field has bad data in it that we haven't had time to clean up. It should, in theory, be symmetrical, and it should, in theory, contain only unifiable forms, but as you note, it doesn't. In addition to the use of the source separation rule, it should also cover characters which were added to the standard in error.

    In any event, I'm afraid that right now it's probably best not to rely on it for anything.

    On Jun 29, 2010, at 8:25 AM, Uriah Eisenstein wrote:

    > Hi,
    > To clarify my question with an example :) The character 亀 (U+4E80) is listed in Unihan as a Z-variant of 龜 (U+9F9C). However, the opposite is not true. Similarly, 疍 (U+758D) is listed as a semantic variant of 蛋 (U+86CB), but not vice versa. From the definitions of these variant types in UAX#38, one would naturally expect them to be symmetrical, and both characters to show each other as variants. There are quite a few other such cases, although it does appear that in most cases the relation is symmetrical.
    > My reason for asking, BTW, is that I'm thinking of grouping characters which are Z-variants of each other in some application, so I need to understand whether Z-variants are expected to have clear "cliques" in which each character is a Z-variant of all others.
    > I realize that the semantic variant relation, at least, is based on external sources and not determined by Unicode; regarding Z-variants I'm not clear. I'd like to know though whether the relation is expected to be symmetrical, and the above cases are to be considered errors; or there is some meaning to a one-directional relation; or something else.
    > On a side note, some Z-variants I've looked at seem to have very different abstract shapes, in some cases looking more like simplified/traditional pairs. As I said I don't know clearly how they are determined. Are they supposed to be exactly those pairs which would be unified if it were not for the Source Separation Rule?
    >
    > TIA,
    > Uriah

    =====
    John H. Jenkins
    jenkins@apple.com



    This archive was generated by hypermail 2.1.5 : Tue Jun 29 2010 - 14:41:18 CDT