Re: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Nov 14 2002 - 23:26:20 EST

  • Next message: Carl W. Brown: "RE: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"

    Carl W. Brown <cbrown at xnetinc dot com> wrote:

    > Converting from UCS-2 to UTF-16 is just like converting from SBCS to
    > DBCS. For folks who think DBCS it is no problem. Those who went from
    > DBCS to Unicode to simplify their lives I am sure are not happy.

    Ken made me laugh last March by referring to this as

        "... a bait and switch tactic, whereby implementers were lulled
        into thinking they had a simple, fixed-width 16-bit system, only
        to discover belatedly that they had bought into yet another
        mixed-width character encoding after all."

    At least with surrogate pairs, we don't have to deal with overlapping
    ranges for lead bytes and trail bytes, or for trail bytes and
    single-byte characters, and we don't have to go through crazy gymnastics
    to "find the last lead byte" if we ever get lost in the middle of a
    UTF-16 string.

    > I think that worst problem is that many systems still sort in binary
    > not code point order. Then you get Oracle and the like wanting to set
    > up a UTF-8 variant that encode each surrogate rather than the
    > character.

    As Michka noted, the mechanism for surrogates has existed for almost a
    decade now. Individuals and companies that ignored surrogates because
    "there aren't any characters there anyway, and when they do add some
    they'll be extremely rare," and are now behind in supporting UTF-16,
    really have nobody else to blame.

    > However, 16 bit characters were a hard enough sell in the good old
    > days. If we had started out withug 2bit characters we would still be
    > dreaming about Unicode.

    I think Carl meant "with 32-bit characters." I don't know what kind of
    word "withug" is (Old English?), but I like it.

    -Doug Ewell
     Fullerton, California



    This archive was generated by hypermail 2.1.5 : Fri Nov 15 2002 - 00:14:28 EST