Re: TC/SC mapping

From: John H. Jenkins (jenkins@apple.com)
Date: Wed Jan 23 2002 - 11:29:57 EST


Basically, the kSimplifiedVariant and kTraditionalVariant fields of the
Unihan database were algorithmically derived from the GB 2812 and GB 12345
mapping tables (since the two are essentially the same standard, the one
with simplified and the other with traditional characters). As the
disclaimer notes, these fields have never been systematically checked or
proofed.

As I've been fixing other problems in the Unihan database lately, however,
  I have been making corrections when I encounter corrections to make.
Again, this is not a systematic effort. That's still down the road a ways,
  although I expect we'll actually get to it in the next couple of months.

Some of the instances you note are clearly errors and should be fixed,
although I can't guarantee they'll make the Unicode 3.2 final version.
Meanwhile, it is true that there are simplified characters which
correspond to more than one traditional form. In the case of U+8721 (蜡),
it is *both* a traditional character in its own right *and* the simplified
form for another character, U+881F (蠟).

Characters which are simplifications for more than one traditional form
are quite common. Just to do a quick survey, I pulled one dictionary off
my shelf. It has in the back a table of simplifications. The first page
has 99 simplified characters, five of which are simplifications for more
than one traditional form. Perhaps that many again are also traditional
characters in their own right. This is also missing out on some of the
more spectacular instances, such as U+53F0 (台), which is a traditional
character itself *and* the simplified form for three others, U+6AAF (檯), U+
81FA (臺), and U+98B1 (颱). There's at least one other character which is
the simplified form for four traditional ones, but off-hand I can't
remember what it is.

==========
John H. Jenkins
jenkins@apple.com
jenkins@mac.com
http://homepage.mac.com/jenkins/



This archive was generated by hypermail 2.1.2 : Wed Jan 23 2002 - 10:59:06 EST