Viswanathan S writes:
> In a HTML page encoded using "gb2312" character encoding ,
> how to distinguish ASCII characters from gb2312 characters ?
Don't confuse the row-cell value for a GB 2312 character for the
encoded version (EUC-CN, usually just called "GB"). In a GB encoded
file all characters outside of ASCII/GB-Roman are in the range 0xA1 -
0xFE.
So, for example, yi1 has row-cell 50-27 in GB 2312-80, but is encoded
as 0xD2BB in a GB encoded file. To convert between row-cell and the
encoded byte, add/subtract 0xA0 to each.
-tree
-- Tom Emerson Basis Technology Corp. Zenkaku Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT