Starng EUDC to Unicode mapping

From: Yung-Fong Tang (ftang@netscape.com)
Date: Thu Aug 12 1999 - 14:29:33 EDT


Can someone from Microsoft answer the following questions ?
1. Simplified Chinese
In Chinese EUDC Ranges
http://msdn.microsoft.com/library/sdkdoc/winbase/eudc_0shf.htm , it
state the mapping from EUDC in cp936 (extension of GB2312) to PUA
Unicode as below

F8A1-FEFE U+E000-U+E29F
AAA1-AFFE U+E2A0-U+E4DF

However, I think the math is strange here.:
Assuming the mapping range for lower bytes are from 0xA1 to 0xFE, then
0xFE(hex) - 0xA1(hex) = 94 (dec)
0xFE(hex) - 0xF8(hex) = 7
7 * 94 = 658(dec) = 292(hex)
0x0292 + 0xE000 - 1 = 0xE291
Therefore, FEFE should map to U+E291, not U+E29F.

Same thing to AAA1-AFFE
Assuming the mapping range for lower bytes are from 0xA1 to 0xFE, then
0xFE(hex) - 0xA1(hex) = 94 (dec)
0xAF(hex) - 0xAA(hex) = 6
6 * 94 = 564(dec) = 234(hex)
0x0234 + 0xE2A0 - 1 = 0xE4D3
Therefore, FEFE should map to U+E4D3, not U+E4DF.

2. Traditional Chinese
Also in the same page, it state that Big5 EUDC to Unicode PUA mapping
are

  FA40 - FEFE U+E000 - U+E310
 8E40 - A0FE U+E311 - U+EEB7
 8140 - 8DFE U+EEB8 - U+F6B0
 C6A1 - C8FE U+F6B1 - U+F8FF

Howerver, the number F8FF is definitely wrong for C8FE.
In Big 5, the 2nd bytes are in the range of 0x40-0x7E and 0xA1-0xFE. So
each row have 157 characters

0x7E-0x40+1 + 0xFE+0xA1+1 = 157(dec)
and 0xC8-0xC6+1 =3
3*157 = 471(dec) = 0x1D7(hex)
F6B1+0x1D7-1=0xF887
Therefore 0xC8FE should map to 0xF887, not 0xF8FF.

Is that correct ?





This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT