RE: Playing with Unicode (was: Re: UTF-17)

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Mon Jun 25 2001 - 10:23:27 EDT


Elliotte Rusty Harold wrote:
> What about ISO-10646-UCS-2 and ISO-10646-UCS-4 as used in XML? Where
> do they fit in? Are they only part of ISO-10646 and not Unicode? or
> are they identical to UTF-16 and UTF-32? or something else?

I didn't include them just because they don't start with "UTF", and all the
recent jokes start with that acronym. So far, no one proposed any humorous
"UCS-something".

About UCS-2, my understanding is that it does not support surrogates, so it
should be limited to range U+0000..U+FFFD (16 bits).

UTF-32 is limited to ..U+10FFFD (20+ bits), while the UCS-4 theoretically
reaches U+7FFFFFFD (31 bits).

There are certainly other differences that I don't know. Do they have any
difference in endianness?

_ Marco



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT