RE: IBM AIX 5 and GB18030

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Thu Nov 14 2002 - 20:25:21 EST

Next message: Carl W. Brown: "UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"

Previous message: Michael \(michka\) Kaplan: "Re: Emergency help required!"
In reply to: Markus Scherer: "Re: IBM AIX 5 and GB18030"
Next in thread: Carl W. Brown: "UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Markus,

My mistake. I should have checked the docs but I thought that AIX used a 16
bit wchar_t. In any case it still takes forever to change things in an OS.
There is so much interrelated code that fixing one thing can brake another.

Carl

> -----Original Message-----
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
> Behalf Of Markus Scherer
> Sent: Thursday, November 14, 2002 9:18 AM
> To: unicode
> Subject: Re: IBM AIX 5 and GB18030
>
>
> Carl W. Brown wrote:
> > Some Unix systems adapted faster because the later Unicode
> adopters used 32
> > bit Unicode characters making the job 100 times easier. Other companies
> > like Microsoft took a very big gamble and implemented the code
> for surrogate
> > support into Windows 2000 based on early drafts of the Unicode
> standard. If
> > they had not done it this way or had guessed wrong they might
> not even have
> > support in Windows XP.
>
> Hi Carl, I am not going to argue with you on what you say about
> ICU :-) but I am not sure about your
> Unix comments.
>
> First, AIX 5 uses 32-bit wchar_t, which is UTF-32 except for the
> zh_TW locale, as far as I know.
> (AIX 5 zh_TW uses a different wchar_t encoding.)
>
> Again as far as I know, Unix/Linux systems chose to use 32-bit
> wchar_t not because of great
> strategic plans or compelling performance analysis, but because
> the existing C stdlib functions for
> wchar_t string handling assume that the single-code-point type is
> the same as the string base unit.
> This one design point requires 32-bit wchar_t not just for
> Unicode but also for the character sets
> of EUC-TW and GB18030.
>
> You seem to suggest that there is a problem with 16-bit Unicode.
> It does take some effort to adapt
> UCS-2-designed functions for UTF-16, but it's not "rocket
> science" and works very well thanks to the
> Unicode allocation practice (common characters in the BMP).
> Making UTF-8/32 functions work with
> supplementary code points when they had assumed BMP-only
> operation probably took some work too.
>
> In fact, on Unix/Linux systems you find not only UTF-32 via
> wchar_t, but also UTF-8 (low-level tools
> and gnome) and UTF-16 (ICU, KDE/Qt, and many applications like
> Mozilla and OpenOffice).
>
> Best regards,
> markus
>
> --
> Opinions expressed here may not reflect my company's positions
> unless otherwise noted.
>
>
>

Next message: Carl W. Brown: "UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
Previous message: Michael \(michka\) Kaplan: "Re: Emergency help required!"
In reply to: Markus Scherer: "Re: IBM AIX 5 and GB18030"
Next in thread: Carl W. Brown: "UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Nov 14 2002 - 21:07:06 EST