Yves Arrouye <yves@realnames.com> writes on Fri, 6 Apr 2001 15:52:59 -0700:
>> Does anybody know if the C++ standard specified how many hex digits
>> max this escape can have? And doesn't the standard say something
>> like \u is for wchar_t, which may not be Unicode (I hope I'm wrong
>> here)?
Here is what
INTERNATIONAL STANDARD ISO/IEC 14882
First edition 1998-09-01
Programming languages -- C++
Langages de programmation -- C++
http://www.iso.ch/cate/d25845.html
https://webstore.ansi.org/
http://webstore.ansi.org/ansidocstore/product.asp?sku=ISO%2FIEC+14882%2D1998
http://webstore.ansi.org/ansidocstore/product.asp?sku=ISO%2FIEC+14882%3A1998
has to say:
>> ...
>> The universal-character-name construct provides a way to name other
>> characters.
>>
>> hex-quad: hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit
>>
>> universal-character-name: \u hex-quad \U hex-quad hex-quad
>>
>> The character designated by the universal-character-name \UNNNNNNNN is
>> that character whose character short name in ISO/IEC 10646 is
>> NNNNNNNN; the character designated by the universal-character-name
>> \uNNNN is that character whose character short name in ISO/IEC 10646
>> is 0000NNNN. If the hexadecimal value for a universal character name
>> is less than 0x20 or in the range 0x7F-0x9F (inclusive), or if the
>> universal character name designates a character in the basic source
>> character set, then the program is ill-formed.
>> ...
Thus, \u and \U both imply ISO/IEC 10646, not some other character
set. However, it is not clear to me on a quick skim that wchar_t
necessarily is big enough to hold any character from this set.
The C99 Standard
INTERNATIONAL STANDARD ISO/IEC 9899
Second edition 1999-12-01
Programming languages -- C
Langages de programmation -- C
http://www.iso.ch/cate/d29237.html
http://webstore.ansi.org/ansidocstore/product.asp?sku=ISO%2FIEC+9899%3A1999
has essentially the same text as the C++98 Standard for the meaning of
\u and \U, and it too is vague about what wchar_t represents.
The C99 Standard then goes on to define:
>> ...
>> __STDC_ISO_10646__
>> An integer constant of the form yyyymmL (for example,
>> 199712L), intended to indicate that values of type
>> wchar_t are the coded representations of the
>> characters defined by ISO/IEC 10646, along with all
>> amendments and technical corrigenda as of the
>> specified year and month.
>> ...
This symbol is not defined in C++98, and evidently, was introduced so
that programmers would have a way of finding out whether wchar_t holds
ISO/IEC 10646 values, or not.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- Center for Scientific Computing FAX: +1 801 585 1640, +1 801 581 4148 -
- University of Utah Internet e-mail: beebe@math.utah.edu -
- Department of Mathematics, 322 INSCC beebe@acm.org beebe@computer.org -
- 155 S 1400 E RM 233 beebe@ieee.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe -
-------------------------------------------------------------------------------
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:15 EDT