From: Frank Yung-Fong Tang (franktang@gmail.com)
Date: Wed Apr 06 2005 - 08:43:10 CST
Not sure how can you map Big5HKSCS to GBK. In particular if you
consider GB18030. It alerady defined how Big5HKSCS should map to the
area outside GBK, if you do map those character to GBK, then it will
make the mapped result incompatable with GB18030.
Why do you care about GBK. GBK is not a national standard nor a de
factor standard. GB18030, which is a superset of GBK is a national
standard is what you should use these days. Both ICU and
Netscape/Mozilla support GB18030 at least as earily as 2001.
On Apr 5, 2005 6:09 PM, Ken Krugler <ken@transpac.com> wrote:
> I'm trying to generate a fairly complete mapping between these two
> legacy encodings, where fuzzy equivalence is OK (and preferable to no
> mapping).
>
> I've been using various .ucm files from ICU, as well as the
> UniHan.txt file (for Simplified & Traditional variants).
>
> This has worked reasonably well for GBK->Big-5+HKSCS, as expected.
> Out of the 7601 characters in GBK that I've got glyph data for, only
> 268 can't be mapped. I could whittle this down a bit by using
> mappings suggested by the cross reference data found in
> NamesList.txt, though each would have to be hand-verified.
>
> For Big-5+HKSCS->GBK, the situation isn't so great. Out of the 18275
> characters in Big-5+HKSCS that I've got glyph data for, 2162 can't be
> mapped. Most of these (1598) are HKSCS characters that map to U+2xxxx
> code points.
After you do that, maybe you should try to fuzzy map all Big5HKSCS
characters to ISO-8859-1 characers ... I guess it won't be "more
difficult" than mapping all Big5HKSCS characters to GBK characters...
>
> So does anybody know of such a mapping table that already exists, or
> a suggestion for how to fuzzily resolve a significant number of the
> remaining unmapped HKSCS? I'm pretty sure somebody else has wrestled
> with this same problem.
>
> And yes, I realize this is a bit like trying to park a Cadillac in a closet :)
>
> Thanks,
>
> -- Ken
> --
> Ken Krugler
> TransPac Software, Inc.
> <http://www.transpac.com>
> +1 530-470-9200
>
>
-- Frank Yung-Fong Tang 譚永鋒 Šýšţém Årçĥîţéçţ
This archive was generated by hypermail 2.1.5 : Wed Apr 06 2005 - 08:46:20 CST