Re: GB18030

From: David Starner (dstarner98@aasaa.ofe.org)
Date: Wed Sep 26 2001 - 23:24:22 EDT


On Wed, Sep 26, 2001 at 06:17:15PM -0700, Yung-Fong Tang wrote:
> Sure Unicode defined those planes, but defining planes without defining the characters in it mean not too much to people. How can
> you implement case conversion, property mapping without knowing what is inside.

How do you do that for BMP characters? There's a whole lot you can do
without knowing the identity of a character. You can draw the glyph from
a font, which will suffice for a lot of purposes.

> In particular, DOES GB18030 define code point to
> code point mapping (beyond BMP) between Unicode? Unless you can said that is YES and show me the specification how to map between
> them, there are no way people can implement code set conversion between GB18030 and Unicode.

Have you looked for the specification? Or are you just going to complain
on the list?

According to GNU libc, the algorithm for coverting a Unicode character
ch outside the BMP to GB18030 to outptr (1 .. 4) is:

        idx := ch + 16#1E248#;
        outptr (4) := (idx div 10) + 16#30#;
        idx := idx / 10;
        outptr (3) := (idx div 126) + 16#81#;
        idx := idx / 126;
        outptr (2) := (idx div 10) + 16#30#;
        outptr (1) := (idx / 10) + 16#81#;
 

-- 
David Starner - dstarner98@aasaa.ofe.org
Pointless website: http://dvdeug.dhis.org
When the aliens come, when the deathrays hum, when the bombers bomb,
we'll still be freakin' friends. - "Freakin' Friends"



This archive was generated by hypermail 2.1.2 : Wed Sep 26 2001 - 22:10:19 EDT