From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jan 06 2011 - 18:08:21 CST
Magnus responded to Samuel Gilman's query:
> > In the Unihan_Variants.txt it seems to show when the characters vary but
> > it's unclear to me.
> > U+3469 kTraditionalVariant U+5138
> > U+346E kSimplifiedVariant U+2B748
> > U+346F kSimplifiedVariant U+3454
> > U+346F kTraditionalVariant U+3454
> > U+3473 kSimplifiedVariant U+3447
> > U+3473 kTraditionalVariant U+3447
> > I took this straight out of Unihan_Varients.txt.
> > Can someone explain what this means?
> > All I need from this is to figure out which variant traditional and which
> > form is simplified.
>
> I'll quote an answer I got to a similar question from August 2008:
>
> "Please see the description for field kSimplifiedVariant in [1]:
>
> Note that a character can be *both* a traditional Chinese character in its
> own right *and* the simplified variant for other characters (e.g., U+53F0).
In this case, however, the problems with traditional and
simplified mappings in Unihan_Variants.txt are a known
defect in the data in that file. The Unicode CJK experts
have been working to correct that data in the master database
used to generate Unihan_Variants.txt, and a notice will be
posted when corrections are available.
In the meantime, the safest alternative for people working
on traditional/simplified mappings would be to ignore the
Version 6.0 Unihan_Variants.txt and make use of the
Version 5.2 file, instead, which doesn't have the data corruption
problems that beset that part of the Version 6.0 Unihan file. See:
http://www.unicode.org/Public/5.2.0/ucd/Unihan.zip
--Ken
This archive was generated by hypermail 2.1.5 : Thu Jan 06 2011 - 18:10:20 CST