Yves Arrouye <yves@realnames.com> writes on Fri, 6 Apr 2001 15:52:59 -0700:
>> Does anybody know if the C++ standard specified how many hex digits
>> max this escape can have? And doesn't the standard say something
>> like \u is for wchar_t, which may not be Unicode (I hope I'm wrong
>> here)?
Here is what
        INTERNATIONAL STANDARD ISO/IEC 14882
        First edition 1998-09-01
        Programming languages -- C++
        Langages de programmation -- C++
        http://www.iso.ch/cate/d25845.html
        https://webstore.ansi.org/
        http://webstore.ansi.org/ansidocstore/product.asp?sku=ISO%2FIEC+14882%2D1998
        http://webstore.ansi.org/ansidocstore/product.asp?sku=ISO%2FIEC+14882%3A1998
has to say:
>> ...
>> 	The universal-character-name construct provides a way to name other
>> 	characters.
>>
>> 	hex-quad: hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit
>>
>> 	universal-character-name: \u hex-quad \U hex-quad hex-quad
>>
>> 	The character designated by the universal-character-name \UNNNNNNNN is
>> 	that character whose character short name in ISO/IEC 10646 is
>> 	NNNNNNNN; the character designated by the universal-character-name
>> 	\uNNNN is that character whose character short name in ISO/IEC 10646
>> 	is 0000NNNN.  If the hexadecimal value for a universal character name
>> 	is less than 0x20 or in the range 0x7F-0x9F (inclusive), or if the
>> 	universal character name designates a character in the basic source
>> 	character set, then the program is ill-formed.
>> ...
Thus, \u and \U both imply ISO/IEC 10646, not some other character
set.  However, it is not clear to me on a quick skim that wchar_t
necessarily is big enough to hold any character from this set.
The C99 Standard
        INTERNATIONAL STANDARD ISO/IEC 9899
        Second edition 1999-12-01
        Programming languages -- C
        Langages de programmation -- C
        http://www.iso.ch/cate/d29237.html
        http://webstore.ansi.org/ansidocstore/product.asp?sku=ISO%2FIEC+9899%3A1999
has essentially the same text as the C++98 Standard for the meaning of
\u and \U, and it too is vague about what wchar_t represents.
The C99 Standard then goes on to define:
>> ...
>> 	__STDC_ISO_10646__
>> 		An integer constant of the form yyyymmL (for example,
>> 		199712L), intended to indicate that values of type
>> 		wchar_t are the coded representations of the
>> 		characters defined by ISO/IEC 10646, along with all
>> 		amendments and technical corrigenda as of the
>> 		specified year and month.
>> ...
This symbol is not defined in C++98, and evidently, was introduced so
that programmers would have a way of finding out whether wchar_t holds
ISO/IEC 10646 values, or not.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- Center for Scientific Computing       FAX: +1 801 585 1640, +1 801 581 4148 -
- University of Utah                    Internet e-mail: beebe@math.utah.edu  -
- Department of Mathematics, 322 INSCC      beebe@acm.org  beebe@computer.org -
- 155 S 1400 E RM 233                       beebe@ieee.org                    -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe  -
-------------------------------------------------------------------------------
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:15 EDT