Dan <Dan.Oscarsson@trab.se> wrote:
>> Bytes %d247-253 are technically legal but will never be needed,
>> as Unicode/ISO 10646 will never grow beyond hex 0010FFFF except for
>> deprecated additional private-use zones that predate Unicode,
>> and bytes %254-255 are outright illegal.
>
> ISO 10646 is 31 bits. All possible values should be allowed.
> I do not know why Unicode have decided to grow their bits to
> more than 16 bits, but not to all 31 bits of ISO 10646.
> But that is no reason to not allow full 31 bits in UTF-8 encoded
> text.
There IS a reason: to allow all of Unicode to be expressed in UTF-8.
You may certainly write your code to understand all 31 bits, but no
values beyond U-0010FFFF will be assigned, so the extra code will be
unnecessary (although harmless).
Please see Technical Report #19 for more information.
-Doug Ewell
Fullerton, California
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT