Re: Codepoint range for Traditional Chinese and Simplified Chinese

From: Mark Crispin (
Date: Sun Jun 13 2010 - 12:38:49 CDT

  • Next message: Hans Aberg: "Re: Writing a proposal for an unusual script: SignWriting"

    On Sun, 13 Jun 2010, Ryan Chan wrote:
    > I want to use a regex to separate the Traditional Chinese and
    > Simplified Chinese, therefore, I would like to know their
    > corresponding codepoint.
    > Have searched in Google but seems cannot find one, or is it possible?

    As far as I know, the only way is to use the information in Unihan.txt,
    which probably does no good for a regex. The CJK spaces (note the plural)
    are not separated by simplified vs. traditional.

    Also, simplified and traditional are not a binary state. Presumably you
    mean "Chinese as commonly written in China" vs. "Chinese as commonly
    written in Taiwan". However, there are many sub-variations; and if you
    consider the effect of other languages that use Han characters (most
    notably Japanese and Korean) then it gets even more complicated.

    Good luck!

    -- Mark --
    Democracy is two wolves and a sheep deciding what to eat for lunch.
    Liberty is a well-armed sheep contesting the vote.

    This archive was generated by hypermail 2.1.5 : Sun Jun 13 2010 - 12:42:12 CDT