Gb2312 encoding

From: Tom Emerson (tree@basistech.com)
Date: Wed Sep 20 2000 - 09:18:42 EDT


Viswanathan S writes:
> In a HTML page encoded using "gb2312" character encoding ,
> how to distinguish ASCII characters from gb2312 characters ?

Don't confuse the row-cell value for a GB 2312 character for the
encoded version (EUC-CN, usually just called "GB"). In a GB encoded
file all characters outside of ASCII/GB-Roman are in the range 0xA1 -
0xFE.

So, for example, yi1 has row-cell 50-27 in GB 2312-80, but is encoded
as 0xD2BB in a GB encoded file. To convert between row-cell and the
encoded byte, add/subtract 0xA0 to each.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Zenkaku Language Hacker                            http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT