RE: GBK Traditional to Simplified mapping table

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Fri Jan 11 2002 - 04:57:08 EST

Previous message: jgo: "RE: Unicode fonts"
Maybe in reply to: Ken Krugler: "GBK Traditional to Simplified mapping table"
Next in thread: Tom Emerson: "RE: GBK Traditional to Simplified mapping table"
Reply: Tom Emerson: "RE: GBK Traditional to Simplified mapping table"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Doug Ewell wrote:
> [...] Far from being a simple operation like Latin
> case mapping (to which it was compared), TC/SC
> requires potentially complex analysis of the text
> being converted.
>
> This is the opinion of many experts within, as well as
> outside, the Unicode standardization effort, and it is
> the reason you will not find a Unicode TC/SC mapping
> table.

Actually, such an table can easily be extracted from Unicode's UniHan
database (a huge file: <http://www.unicode.org/Public/UNIDATA/Unihan.txt>).

The relevant information for TC->SC is field <kSimplifiedVariant>, and for
SC->TC is field <kTraditionalVariant>.

As each field is on a separate line, the information can be extracted quite
simply, such as with the DOS command:

find "kSimplifiedVariant" Unihan.txt > kSimplifiedVariant.txt

However, as Doug explained, this 1-to-1 data is NOT suitable for a
full-fledged conversion. However, the data may be a good starting point for
more complex approaches.

It can also turn useful for implementing things such as a user-friendly
search function, that would match any variant of the sought characters. In
this respect, UniHan contains two more fields that may be useful:
<kSemanticVariant>, <kSpecializedSemanticVariant>.

_ Marco

Previous message: jgo: "RE: Unicode fonts"
Maybe in reply to: Ken Krugler: "GBK Traditional to Simplified mapping table"
Next in thread: Tom Emerson: "RE: GBK Traditional to Simplified mapping table"
Reply: Tom Emerson: "RE: GBK Traditional to Simplified mapping table"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jan 11 2002 - 04:39:24 EST