Re: Corrigendum #9

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Wed, 2 Jul 2014 23:59:55 +0100

On Wed, 2 Jul 2014 21:19:16 +0200
Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:

> 2014-07-02 20:19 GMT+02:00 David Starner <prosfilaes_at_gmail.com>:
>
> > I might argue 11111111b for 0x00 in UTF-8 would be technically
> > legal
 
> But the same C libraries are also using -1 as end-of-stream values
> and if they are converted to bytes, they will be undistinctable from
> the NULL character that could be stored everywhere in the stream.

A 0xFF byte in a narrow character stream is converted to 0x00FF (int is
at least 16 bits wide) in the interfaces while the narrow character
end-of-stream value EOF is required to be negative. Unfortunately, the
wide character end-of-stream marker WEOF is not required to be
negative, but it is not allowed to be a representable character. C
appears to prohibit U+FFFF as well as supplementary characters if
wchar_t is only 16 bits wide.

Richard.
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Wed Jul 02 2014 - 18:02:05 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 02 2014 - 18:02:05 CDT