RE: GB18030

From: Sampo Syreeni (decoy@iki.fi)
Date: Fri Sep 21 2001 - 15:15:22 EDT


On Fri, 21 Sep 2001, Carl W. Brown wrote:

>Most systems that handle GB18030 will want to convert it to Unicode first
>to reduce processing overhead.

Unless we start seeing Chinese software which is designed to utilize the
compatibility between 18030 and GBK -- font rendering apps and the influence
such OS level functionality tends to have on common APIs immediately come to
mind.

Besides, if the Chinese for any reason get bored enough with the Unicode
and/or ISO character allocation process, they might indeed start assigning
some of those extra code points in 18030. If this ever happens, the
incompatibility might well lead to a significant mass of software with 18030
as the primary character set.

>With GB18030 you some times have to check the first two characters.
>UTF-8 for example is an MBCS character set but if I am going backwards
>through a string I can do so. With GB18030 I must start over from the
>beginning of the string to find the start of the previous character.

Actually I think the previous line feed will buy you a sync.

Still, that is a *very* bad thing, especially since we know that many of
earlier ISO2022 derived multibyte codings had problems with string search
and like functionality which were all but solved by UTF-8. It'd be a real
shame to see progress towards encodings which force people to again devote
time to something that has already been solved once.

>It is smaller that UTF-8 for Chinese and larger for anyone else.

But you'll have to condeed that that is a significant point, especially if
people perceive UTF-8 coded Chinese as being unacceptably large compared to
existing Chinese encodings (GB, Big Five, now 18030). A billion people, and
so forth...

Sampo Syreeni, aka decoy, mailto:decoy@iki.fi, gsm: +358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front



This archive was generated by hypermail 2.1.2 : Fri Sep 21 2001 - 14:32:23 EDT