RE: TC/SC mapping

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Wed Jan 23 2002 - 05:47:35 EST


Doug Ewell wrote:
> U+4E48 kSimplifiedVariant U+9EBD
> U+4E48 kTraditionalVariant U+9EBD
> ...
> U+540E kSimplifiedVariant U+5F8C
> U+540E kTraditionalVariant U+5F8C
> ...
> U+5F8C kSimplifiedVariant U+540E
> U+5F8C kTraditionalVariant U+540E
> ...
> U+9EBD kSimplifiedVariant U+4E48
> U+9EBD kTraditionalVariant U+4E48
>
> This means that U+4E48 and U+9EBD are both simplified *and*
> traditional variants of each other, and U+540E and U+5F86
> are both simplified *and* traditional variants of each
> other! Can this be true?

As a matter of fact, U+4E48 (么) is the simplified form of U+9EBD (麽). So I
guess that kSimplifiedVariant field for U+4E48 and the kTraditionalVariant
field for U+9EBD are mistakes, and should simply be removed.

I suspect that the other case is similar, but I am not sure.

> I also noticed:
>
> U+4F59 kSimplifiedVariant U+9980
> U+4F59 kTraditionalVariant U+9918
> ...
> U+9918 kSimplifiedVariant U+4F59
> ...
> U+9980 kTraditionalVariant U+4F59
>
> which seems strange. If the simplified variant of U+4F59 is
> U+9980, and the traditional variant of U+4F59 is U+9918,
> then what is U+4F59?

Perhaps U+4F59 (余) is the *Japanese* simplified form, while U+9980 (馀) is
the *Chinese* simplified form, both corresponding to the traditional form
U+9918 (餘).

A very well-known case of such triplets is the verb "to sell": Japanese
simplified form U+58F2 (売), Chinese simplified form U+5356 (卖), traditional
form U+8CE3 (賣).

However, in this case, UniHan seems to express the relationship with the
Japanese form only through the kZVariant field:

        U+05356 kTraditionalVariant U+08CE3
        U+05356 kZVariant U+08CE3
        ...
        U+058F2 kZVariant U+08CE3
        ...
        U+08CE3 kSimplifiedVariant U+05356
        U+08CE3 kZVariant U+058F2

> In the Unicode 3.2 (beta) UniHan file, there is a new twist:
> characters whose traditional equivalent is given as TWO
> characters:
>
> U+836F kTraditionalVariant U+846F U+85E5

Gulp! But the format information included in the file reads:

        # kTraditionalVariant
        # The Unicode value for a (Chinese) traditional
variant for this character.

So, there should be *a*, that is *one*, traditional variant...

_ Marco



This archive was generated by hypermail 2.1.2 : Wed Jan 23 2002 - 05:34:23 EST