Re: U+xxxx, U-xxxxxx, and the basics

From: Keld Jørn Simonsen (keld@dkuug.dk)
Date: Wed Mar 08 2000 - 11:05:15 EST


On Wed, Mar 08, 2000 at 03:59:40PM -0000, Marco.Cimarosti@icl.com wrote:
> Keld Jørn Simonsen wrote, responding to me:
> >>I understood that Unicode had extended beyond the
> >>0x0..0xFFFF range. The fact that no code point
> >>is assigned yet in the 0x10000..0x10FFFF range
> >>does not mean that these code points don't exist.
> >
> > Yes, but my last reading was that surrogates are characters.
> > Maybe it was changed with 3.0
>
> Uhm... Probably they are: the meaning of "character" is every day more
> vague.
>
> But this brings another question: what is the role of surrogates if I am
> using 32-bit units?
>
> Consider this *UCS-4* fragment:
>
> ... U-00D8000 U-00DC00 ...
>
> What kind of animal would that be!? An (absurd) sequence of two characters
> or an alternative spelling for U-010000?

The specific codes for UTF-16 extension into plane 1-16
is not allowed in UCS-4 (or in UTF-8 for that matter).

Keld



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT