Re: Surrogates and noncharacters

From: Doug Ewell <doug_at_ewellic.org>
Date: Mon, 11 May 2015 10:44:19 -0700

Hans Aberg <haberg dash 1 at telia dot com> wrote:

>>> However I wonder what would be the effect of D80 in UTF-32: is
>>> <0xFFFFFFFF> a valid "32-bit string" ?
>>
>> The value 0xFFFFFFFF cannot appear in a UTF-32 string. Therefore it
>> cannot represent a unit of encoded text in a UTF-32 string.
>
> Even though the values with highest bit set are not a part of original
> UTF-32, it can easily be extended also to original UTF-8, which may be
> simpler to implement.

"Original UTF-8," regardless of where defined, only ever encoded scalar
values up to 0x7FFFFFFF. See, for example, RFC 2279.

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸
Received on Mon May 11 2015 - 12:46:35 CDT

This archive was generated by hypermail 2.2.0 : Mon May 11 2015 - 12:46:38 CDT