Re: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Nov 14 2002 - 23:26:20 EST

Next message: Carl W. Brown: "RE: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"

Previous message: Carl W. Brown: "UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
In reply to: Carl W. Brown: "UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
Next in thread: Carl W. Brown: "RE: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
Reply: Carl W. Brown: "RE: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Carl W. Brown <cbrown at xnetinc dot com> wrote:

> Converting from UCS-2 to UTF-16 is just like converting from SBCS to
> DBCS. For folks who think DBCS it is no problem. Those who went from
> DBCS to Unicode to simplify their lives I am sure are not happy.

Ken made me laugh last March by referring to this as

    "... a bait and switch tactic, whereby implementers were lulled
    into thinking they had a simple, fixed-width 16-bit system, only
    to discover belatedly that they had bought into yet another
    mixed-width character encoding after all."

At least with surrogate pairs, we don't have to deal with overlapping
ranges for lead bytes and trail bytes, or for trail bytes and
single-byte characters, and we don't have to go through crazy gymnastics
to "find the last lead byte" if we ever get lost in the middle of a
UTF-16 string.

> I think that worst problem is that many systems still sort in binary
> not code point order. Then you get Oracle and the like wanting to set
> up a UTF-8 variant that encode each surrogate rather than the
> character.

As Michka noted, the mechanism for surrogates has existed for almost a
decade now. Individuals and companies that ignored surrogates because
"there aren't any characters there anyway, and when they do add some
they'll be extremely rare," and are now behind in supporting UTF-16,
really have nobody else to blame.

> However, 16 bit characters were a hard enough sell in the good old
> days. If we had started out withug 2bit characters we would still be
> dreaming about Unicode.

I think Carl meant "with 32-bit characters." I don't know what kind of
word "withug" is (Old English?), but I like it.

-Doug Ewell
Fullerton, California

Next message: Carl W. Brown: "RE: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
Previous message: Carl W. Brown: "UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
In reply to: Carl W. Brown: "UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
Next in thread: Carl W. Brown: "RE: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
Reply: Carl W. Brown: "RE: UTF-16 vs UTF-32 (was IBM AIX 5 and GB18030"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Nov 15 2002 - 00:14:28 EST