From: Tim Greenwood (timg1952@aol.com)
Date: Thu Dec 11 2003 - 10:57:47 EST
In my interpretation of the C standard (which I am reading from
http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf) UTF-8 is not a
valid wchar_t encoding if your execution character set contains
characters outside the C0 controls and Basic Latin range, and UTF-16 is
not a valid wchar_t encoding if your execution character set has
characters outside the BMP. In other words whatever you consider to be a
character (which may be a combining character) must be encoded in one
wchar_t code unit.
The relevant passage is
11 A wide character constant has type wchar_t, an integer type defined
in the <stddef.h> header. The value of a wide character constant
containing a single multibyte character that maps to a member of the
extended execution character set is the wide character (code)
corresponding to that multibyte character, as defined by the mbtowc
function, with an implementation-defined current locale. The value of a
wide character constant containing more than one multibyte character, or
containing a multibyte character or escape sequence not represented in
the extended execution character set, is implementation-defined.
Tim
This archive was generated by hypermail 2.1.5 : Thu Dec 11 2003 - 11:52:16 EST