From: Mark Crispin (mrc+unicode@panda.com)
Date: Sun Jun 13 2010 - 12:38:49 CDT
On Sun, 13 Jun 2010, Ryan Chan wrote:
> I want to use a regex to separate the Traditional Chinese and
> Simplified Chinese, therefore, I would like to know their
> corresponding codepoint.
>
> Have searched in Google but seems cannot find one, or is it possible?
As far as I know, the only way is to use the information in Unihan.txt,
which probably does no good for a regex. The CJK spaces (note the plural)
are not separated by simplified vs. traditional.
Also, simplified and traditional are not a binary state. Presumably you
mean "Chinese as commonly written in China" vs. "Chinese as commonly
written in Taiwan". However, there are many sub-variations; and if you
consider the effect of other languages that use Han characters (most
notably Japanese and Korean) then it gets even more complicated.
Good luck!
-- Mark --
http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.
This archive was generated by hypermail 2.1.5 : Sun Jun 13 2010 - 12:42:12 CDT