Re: Double Byte enabled

From: A. Vine (avine@eng.sun.com)
Date: Thu Apr 06 2000 - 13:10:20 EDT


Murray Sargent wrote:
>
> MBCS is a generic term that includes SBCS, DBCS, and character sets with
> more than two bytes. In an operational sense UTF-8 is a kind of MBCS, since
> in order to deal with it directly (rather than translating it to UTF-16 or
> UTF-32) you have to navigate over 1 to 4 bytes. (5 and 6 are ruled out by
> recent standards activities). A cool thing about UTF-8 is that you can
> easily find the start of a character if you land on a trail byte. But you
> still have to deal with other problems of MBCS, such as ensuring that the
> text cursor (or caret) always points to the start of a character, and saving
> for the next read any partial character sequence that ends an input buffer
> (if you need to translate to UTF-16 or UTF-32).
>
> UTF-16 surrogate pairs have similar considerations, but they are relatively
> easy to deal with, especially if your code can already handle multicharacter
> sequences such as CR LF and combining-mark sequences.
>
> Again, the thing I'd recommend is Unicode enabling rather than MBCS or DBCS
> enabling.
>

My definition of multi-byte coincides with Murray's. It's a generic reference
to charsets which have variable byte lengths for the characters. Double-byte
and single-byte are fixed. Using double-byte to refer to a charset which uses
both single-byte and double-byte lengths for its characters is misleading, IMHO.

Andrea

-- 
Andrea Vine, avine@eng.sun.com, iPlanet i18n architect
...even if it requires not really a dance with the Devil, but 
call it a brief shimmy with his accountant's daughter.
-- Sean Burke http://www.netadventure.net/~sburke/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT