On Thu, Feb 21, 2013 at 2:12 PM, Richard Wordingham <
richard.wordingham_at_ntlworld.com> wrote:
> Microsoft chose WEOF=0xffff. I don't think it can easily be changed to
> a better value until an incompatible processor architecture is used.
> Changing it is likely to break existing executables and object
> libraries.
>
If this is true, it's certainly a poor choice, and might violate the C
standard. (I have not checked the actual standard for wgetc(), wint_t &
WEOF.)
16-bit wchar_t doesn't exactly support 21-bit Unicode.
Right -- that's why the standard library uses a separate type, wint_t,
which can be wider if necessary.
Nothing requires a library that processes 16-bit Unicode strings to have a
16-bit type for a single-character return value. Just like the C standard
getc() returns a *negative* EOF value, in an integer type that is wider
than a byte.
The UTC is now applying additional pressure for the making of the
> distinction between UTF-16 and UTF-16LE.
The UTC is doing no such thing. Nothing has changed with regard to the
UTF-16 encoding scheme and the BOM.
U+FFFE has always been a code point that will never have a real character
assigned to it, that's why it is *unlikely* to appear as the first
character in a text file and thus useful as a "reverse BOM". However, it
was never forbidden from occurring in the text.
Best practice for file encodings has always been to declare the encoding.
Second best for UTF-16 is to always include the BOM, even if the byte order
is big-endian. And since most computers are little-endian, they need to
include the BOM in UTF-16 file encodings anyway (if they use their native
endianness).
markus
Received on Thu Feb 21 2013 - 17:30:09 CST
This archive was generated by hypermail 2.2.0 : Thu Feb 21 2013 - 17:30:10 CST