Re: GB18030

From: Yung-Fong Tang (ftang@netscape.com)
Date: Thu Sep 27 2001 - 16:29:22 EDT


Kenneth Whistler wrote:

> Frank,
>
> > You don't need to explain to me
> > the concept of GB18030. The question I have is about details mapping
> > information.
>
> Now, now, there's no need to get snippy with me. It sounded
> like you were unclear from the kinds of questions you were
> asking.

Sorry for that. I have any flame in my message.

> > I look at
> > http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml .
> >
> > It is interesting that the mapping between U+10000 and U+10FFFF is check
> > in only 5 weeks ago in the version 1.3
> >
> > | 30910: <range uFirst="10000" uLast="10FFFF"
> > bFirst="90 30 81 30" bLast="E3 32 9A 35" bMin="81 30 81 30" bMax="FE 39
> > FE 39"/>
> >
>
> > Is the U+10000 - U+10FFFF mapping between Unicode and GB18030 specified
> > in the GB18030 standard itself? can someone fax me that page ? Thanks.
>
> Unfortunately, I don't have the revised and corrected version of
> the standard to hand.

Is that possible you can fax me the old original version ? My fax number is
+1 650 937 5413 . Thanks

> But on p. 5, clause 7.3 of the original GB 18030-2000, it states (in
> Chinese):
>
> "From 0x90308130 to 0xE339FE39, altogether 1058400 code points, correspond
> to GB 13000's 16 supplementary planes..."

Thank you very much. This is the information I need. It clearly define the
mapping between GB18030 to Unicode supplement planes in the character level.
Thanks. With this information, we can implement the conversion between
GB18030 to Unicode.

> If you look at the ICU specification, bFirst="90 30 81 30" and
> bLast="E3 32 9A 35" corresponds to:
>
> 83 "groups" (90..E2) of GB 18030: 83 x 10 x 1260 = 1045800 code points
> 2 "planes" (E3 30..31) of GB 18030: 2 x 1260 = 2520 code points
> 25 "rows" (E3 32 81..99) of GB 18030: 25 x 10 = 250 code points
> 6 "cells" (E3 32 9A 30..35) of GB 18030: 6 code points
> Total 1048576 code points
>
> And 1048576 code points = 16 x 66536 code points = 16 planes of 10646.
>
> So GB 18030 and ICU agree. Start at 0x90308130 and lay out all the
> rest of the Unicode supplementary code points in order.
>
> --Ken



This archive was generated by hypermail 2.1.2 : Thu Sep 27 2001 - 15:18:02 EDT