UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Thu Nov 14 2002 - 21:03:04 EST

  • Next message: Doug Ewell: "Re: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"


    > You seem to suggest that there is a problem with 16-bit Unicode.
    > It does take some effort to adapt
    > UCS-2-designed functions for UTF-16, but it's not "rocket
    > science" and works very well thanks to the
    > Unicode allocation practice (common characters in the BMP).
    > Making UTF-8/32 functions work with
    > supplementary code points when they had assumed BMP-only
    > operation probably took some work too.

    Converting from UCS-2 to UTF-16 is just like converting from SBCS to DBCS.
    For folks who think DBCS it is no problem. Those who went from DBCS to
    Unicode to simplify their lives I am sure are not happy.

    I think that worst problem is that many systems still sort in binary not
    code point order. Then you get Oracle and the like wanting to set up a
    UTF-8 variant that encode each surrogate rather than the character.

    However, 16 bit characters were a hard enough sell in the good old days. If
    we had started out withug 2bit characters we would still be dreaming about


    This archive was generated by hypermail 2.1.5 : Thu Nov 14 2002 - 21:49:42 EST