UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Thu Nov 14 2002 - 21:03:04 EST

  • Next message: Doug Ewell: "Re: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"

    Markus,

    > You seem to suggest that there is a problem with 16-bit Unicode.
    > It does take some effort to adapt
    > UCS-2-designed functions for UTF-16, but it's not "rocket
    > science" and works very well thanks to the
    > Unicode allocation practice (common characters in the BMP).
    > Making UTF-8/32 functions work with
    > supplementary code points when they had assumed BMP-only
    > operation probably took some work too.

    Converting from UCS-2 to UTF-16 is just like converting from SBCS to DBCS.
    For folks who think DBCS it is no problem. Those who went from DBCS to
    Unicode to simplify their lives I am sure are not happy.

    I think that worst problem is that many systems still sort in binary not
    code point order. Then you get Oracle and the like wanting to set up a
    UTF-8 variant that encode each surrogate rather than the character.

    However, 16 bit characters were a hard enough sell in the good old days. If
    we had started out withug 2bit characters we would still be dreaming about
    Unicode.

    Carl



    This archive was generated by hypermail 2.1.5 : Thu Nov 14 2002 - 21:49:42 EST