RE: IBM AIX 5 and GB18030

From: Carl W. Brown (
Date: Thu Nov 14 2002 - 20:25:21 EST

  • Next message: Carl W. Brown: "UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"


    My mistake. I should have checked the docs but I thought that AIX used a 16
    bit wchar_t. In any case it still takes forever to change things in an OS.
    There is so much interrelated code that fixing one thing can brake another.


    > -----Original Message-----
    > From: []On
    > Behalf Of Markus Scherer
    > Sent: Thursday, November 14, 2002 9:18 AM
    > To: unicode
    > Subject: Re: IBM AIX 5 and GB18030
    > Carl W. Brown wrote:
    > > Some Unix systems adapted faster because the later Unicode
    > adopters used 32
    > > bit Unicode characters making the job 100 times easier. Other companies
    > > like Microsoft took a very big gamble and implemented the code
    > for surrogate
    > > support into Windows 2000 based on early drafts of the Unicode
    > standard. If
    > > they had not done it this way or had guessed wrong they might
    > not even have
    > > support in Windows XP.
    > Hi Carl, I am not going to argue with you on what you say about
    > ICU :-) but I am not sure about your
    > Unix comments.
    > First, AIX 5 uses 32-bit wchar_t, which is UTF-32 except for the
    > zh_TW locale, as far as I know.
    > (AIX 5 zh_TW uses a different wchar_t encoding.)
    > Again as far as I know, Unix/Linux systems chose to use 32-bit
    > wchar_t not because of great
    > strategic plans or compelling performance analysis, but because
    > the existing C stdlib functions for
    > wchar_t string handling assume that the single-code-point type is
    > the same as the string base unit.
    > This one design point requires 32-bit wchar_t not just for
    > Unicode but also for the character sets
    > of EUC-TW and GB18030.
    > You seem to suggest that there is a problem with 16-bit Unicode.
    > It does take some effort to adapt
    > UCS-2-designed functions for UTF-16, but it's not "rocket
    > science" and works very well thanks to the
    > Unicode allocation practice (common characters in the BMP).
    > Making UTF-8/32 functions work with
    > supplementary code points when they had assumed BMP-only
    > operation probably took some work too.
    > In fact, on Unix/Linux systems you find not only UTF-32 via
    > wchar_t, but also UTF-8 (low-level tools
    > and gnome) and UTF-16 (ICU, KDE/Qt, and many applications like
    > Mozilla and OpenOffice).
    > Best regards,
    > markus
    > --
    > Opinions expressed here may not reflect my company's positions
    > unless otherwise noted.

    This archive was generated by hypermail 2.1.5 : Thu Nov 14 2002 - 21:07:06 EST