Re: IBM AIX 5 and GB18030

From: Markus Scherer (
Date: Thu Nov 14 2002 - 12:18:16 EST

  • Next message: Markus Scherer: "Re: IBM AIX 5 and GB18030"

    Carl W. Brown wrote:
    > Some Unix systems adapted faster because the later Unicode adopters used 32
    > bit Unicode characters making the job 100 times easier. Other companies
    > like Microsoft took a very big gamble and implemented the code for surrogate
    > support into Windows 2000 based on early drafts of the Unicode standard. If
    > they had not done it this way or had guessed wrong they might not even have
    > support in Windows XP.

    Hi Carl, I am not going to argue with you on what you say about ICU :-) but I am not sure about your
    Unix comments.

    First, AIX 5 uses 32-bit wchar_t, which is UTF-32 except for the zh_TW locale, as far as I know.
    (AIX 5 zh_TW uses a different wchar_t encoding.)

    Again as far as I know, Unix/Linux systems chose to use 32-bit wchar_t not because of great
    strategic plans or compelling performance analysis, but because the existing C stdlib functions for
    wchar_t string handling assume that the single-code-point type is the same as the string base unit.
    This one design point requires 32-bit wchar_t not just for Unicode but also for the character sets
    of EUC-TW and GB18030.

    You seem to suggest that there is a problem with 16-bit Unicode. It does take some effort to adapt
    UCS-2-designed functions for UTF-16, but it's not "rocket science" and works very well thanks to the
    Unicode allocation practice (common characters in the BMP). Making UTF-8/32 functions work with
    supplementary code points when they had assumed BMP-only operation probably took some work too.

    In fact, on Unix/Linux systems you find not only UTF-32 via wchar_t, but also UTF-8 (low-level tools
    and gnome) and UTF-16 (ICU, KDE/Qt, and many applications like Mozilla and OpenOffice).

    Best regards,

    Opinions expressed here may not reflect my company's positions unless otherwise noted.

    This archive was generated by hypermail 2.1.5 : Thu Nov 14 2002 - 13:12:29 EST