Re: UNICODE version of _T(x) macro

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Nov 22 2010 - 13:08:37 CST

  • Next message: Kenneth Whistler: "Re: UNICODE version of _T(x) macro"

    On 11/22/2010 10:18 AM, Phillips, Addison wrote:
    >> sowmya satyanarayana<sowmya underscore satyanarayana at yahoo dot
    >> com>
    >> wrote:
    >>
    >>> Taking this, what is the best way to define _T(x) macro of
    >> UNICODE version, so
    >>> that my strings will always be
    >>> 2 byte wide character?
    >> Unicode characters aren't always 2 bytes wide. Characters with
    >> values
    >> of U+10000 and greater take two UTF-16 code units, and are thus 4
    >> bytes
    >> wide in UTF-16.
    >>
    > Not exactly. The code units for UTF-16 are always 16-bits wide. Supplementary characters (those with code points>= U+10000) use a surrogate pair, which are two 16-bit code units. Most processing and string traversal is in terms of the 16-bit code units, with a special case for the surrogate pairs.
    >
    > It is very useful when discussing Unicode character encoding forms to distinguish between characters ("code points") and their in memory representation ("code units"), rather than using non-specific terminology such as "character".
    >
    > If you want to use UTF-32, which uses 32-bit code units, one per code point, you can use a 32-bit data type instead. Those are always 4 bytes wide.

    The question is relevant to the C and C++ languages.

    What is asked: which native data type to I use to make sure I end up
    with a 16-bit code unit.

    The usual way a _T macro is used is

    TCHAR x = _T('x');
    TCHAR * x = _T("x");

    that is to wrap a string or character literal so that it can be used
    either as Unicode literal or as non-Unicode literal, depending on
    whether some global compile time flat (usually UNICODE or _UNICODE) is
    set or not.

    The usual way a _T macro is defined is something like:

    #ifdef UNICODE
    #define _T(x) L##x
    #else
    #define _T(x) x
    #endif

    That defintion relies on the compiler to support L'x' or L"string" by
    using UTF-16.

    A few years ago, there was a proposal to amend the C standard to have a
    way to ensure that this is the case in a cross platform way. I can't
    recall offhand what became of it.

    A./



    This archive was generated by hypermail 2.1.5 : Mon Nov 22 2010 - 13:10:56 CST