From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Nov 22 2010 - 13:08:37 CST
On 11/22/2010 10:18 AM, Phillips, Addison wrote:
>> sowmya satyanarayana<sowmya underscore satyanarayana at yahoo dot
>> com>
>> wrote:
>>
>>> Taking this, what is the best way to define _T(x) macro of
>> UNICODE version, so
>>> that my strings will always be
>>> 2 byte wide character?
>> Unicode characters aren't always 2 bytes wide. Characters with
>> values
>> of U+10000 and greater take two UTF-16 code units, and are thus 4
>> bytes
>> wide in UTF-16.
>>
> Not exactly. The code units for UTF-16 are always 16-bits wide. Supplementary characters (those with code points>= U+10000) use a surrogate pair, which are two 16-bit code units. Most processing and string traversal is in terms of the 16-bit code units, with a special case for the surrogate pairs.
>
> It is very useful when discussing Unicode character encoding forms to distinguish between characters ("code points") and their in memory representation ("code units"), rather than using non-specific terminology such as "character".
>
> If you want to use UTF-32, which uses 32-bit code units, one per code point, you can use a 32-bit data type instead. Those are always 4 bytes wide.
The question is relevant to the C and C++ languages.
What is asked: which native data type to I use to make sure I end up
with a 16-bit code unit.
The usual way a _T macro is used is
TCHAR x = _T('x');
TCHAR * x = _T("x");
that is to wrap a string or character literal so that it can be used
either as Unicode literal or as non-Unicode literal, depending on
whether some global compile time flat (usually UNICODE or _UNICODE) is
set or not.
The usual way a _T macro is defined is something like:
#ifdef UNICODE
#define _T(x) L##x
#else
#define _T(x) x
#endif
That defintion relies on the compiler to support L'x' or L"string" by
using UTF-16.
A few years ago, there was a proposal to amend the C standard to have a
way to ensure that this is the case in a cross platform way. I can't
recall offhand what became of it.
A./
This archive was generated by hypermail 2.1.5 : Mon Nov 22 2010 - 13:10:56 CST