Re: GB18030

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Sep 27 2001 - 18:01:29 EDT


Yung-Fong Tang wrote:
> ... But you
> still need to know what U+4ff3a to define such mapping table, right?

Wrong. You just need to know the mapping between code points, whether assigned, used, or whatever.

> ... So, whatever the software the user currently have today, without an
> upgrade (either upgrade the code or mapping table) still won't know how to
> convert U+4ff3a to lower case or upper case, right ?

No, but that's irrelevant for character conversion. Once you update the Unicode character database in your product, your software will do it - if it knows how to deal with supplementary characters in general. (That part is a technicality which is, again, independent of whether there _are_ assigned characters.)

> But how can you generate such mapping table without knowing that character ?

By specifying which _code point_ in one encoding gets mapped to which other _code point_ in the other encoding.
Character conversion never looks at whether the code points that it maps are actual _characters_.

When you map between the GBK or Shift-JIS user-defined areas and Unicode PUA or similar, then you also map code points that don't have characters. What's new?

> ...
> How many years does it take for people to realize that give a new mappint to
> their customer still need a complete life cycle of QA and distribution? And
> there will be a new version number attach to the software for that.

Is this about the existence of supplementary characters again?
They exist since 1996, and a vendor who followed the UTC/ISO negotiations could see it coming since 1993.
Surely most everyone had the time to roll out a new release of their software to get the support for them in - in more than five years?

(I know that few actually worked on this in time. But time there was.)

markus



This archive was generated by hypermail 2.1.2 : Thu Sep 27 2001 - 16:33:59 EDT