RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

From: Peter_Constable@sil.org
Date: Tue Feb 20 2001 - 09:00:25 EST


On 02/20/2001 03:34:28 AM Marco Cimarosti wrote:

>How about considering UTF-32 as the default Unicode form, in order to be
>able to provide a short answer of this kind:
>
> "Unicode is now a 32-bit character encoding standard, although only
>about one million of codes actually exist, and there are ways of
>representing Unicode characters as sequences of 8-bit bytes or 16-bit
>words."

Well, it's probably a better answer to say that Unicode is a 20.1-bit
encoding since the direct encoding of characters is the coded character set
-- i.e. Unicode scalar values, and the codespace for these requires 20.1
bits.

Of course, saying "20.1 bits" only invites a bunch of questions that will
inevitably lead to the same drawn out answers, so perhaps you can fudge a
bit and say that it's "basically a 21-bit encoding standard". If you get a
response along the lines that no machine architecture is based on 21 bits,
then you say that there 8-, 16- and 32-bit encoding forms that are each
capable of representing the entire coded character set. At that point,
they'll either be content, feel that their in over their head (and that you
must be superintelligent), or they'll be receptive and want to know more,
at which point you direct them to UTR#17.

Ciao!
- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT