Rick Cameron asked:
> Are you planning to add an explicit statement to the Unicode standard that
> the valid range for scalar values is 0..10FFFF? (Or is such a statement
> there, and I've just missed it?)
Unicode 3.0, p. 45, D28:
Unicode scalar value: a number N from 0 to 10FFFF<sub>16</sub>...
and p. 46, D29, second bullet:
* Any sequence of code values that would correspond to a scalar value
greater than 10FFFF<sub>16</sub> is illegal.
>
> In the absence of such a statement, I think it's very easy for people to get
> the idea that the range of scalar values is unbounded above, and that any
> limit is a property of a particular encoding.
>
> In particular, as the use of 32-bit variables to hold Unicode characters
> becomes more common (apparently most unices make wchar_t 32 bits wide), many
> will imagine that such a variable represents a 32-bit encoding of Unicode,
> with range 0..FFFFFFFF, where it just happens that every value above 10FFFF
> is unassigned.
>
> I am one such person (but no longer!)
>
> Of course, the Unicode Standard 3.0 doesn't even mention a 32-bit encoding -
> but that's not stopping uniphiles from storing Unicode data in their
> wchar_t's!
It's the Unicode Standard 3.1 that you need to be referring to.
UTF-32 was incorporated into the standard at that point. See
http://www.unicode.org/unicode/reports/tr27/
--Ken
This archive was generated by hypermail 2.1.2 : Tue Dec 18 2001 - 19:08:29 EST