On Wed, 2 Jul 2014 21:19:16 +0200
Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
> 2014-07-02 20:19 GMT+02:00 David Starner <prosfilaes_at_gmail.com>:
>
> > I might argue 11111111b for 0x00 in UTF-8 would be technically
> > legal
> But the same C libraries are also using -1 as end-of-stream values
> and if they are converted to bytes, they will be undistinctable from
> the NULL character that could be stored everywhere in the stream.
A 0xFF byte in a narrow character stream is converted to 0x00FF (int is
at least 16 bits wide) in the interfaces while the narrow character
end-of-stream value EOF is required to be negative. Unfortunately, the
wide character end-of-stream marker WEOF is not required to be
negative, but it is not allowed to be a representable character. C
appears to prohibit U+FFFF as well as supplementary characters if
wchar_t is only 16 bits wide.
Richard.
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Wed Jul 02 2014 - 18:02:05 CDT
This archive was generated by hypermail 2.2.0 : Wed Jul 02 2014 - 18:02:05 CDT