From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Apr 30 2003 - 18:43:20 EDT
Hi all, I am wondering how developers get 16-bit string *literals* into C source code. Do you use a
mechanism other than the following?
In the following, I use UChar as an example typedef name for the type of 16-bit Unicode strings
(usually same as unsigned short).
Escapes for non-ASCII characters would be ok. UTF-8/16 for the source code would be nicer. Whatever
mechanism has to work on a non-ASCII platform, too.
I am aware that there is an effort under way to add 16-bit Unicode string literals to the C
standard; I am looking for what can be done today.
I know of
a) array of numeric constants
const UChar string[]={ 0x61, 0x62, 0x20ac };
b) array of numeric constants expressed as named constants
enum { _a=0x61, _b, _c, ..., _Euro=0x20ac, ... };
const UChar string[]={ _a, _b, _Euro };
c) on some lucky platforms with 16-bit-Unicode wchar_t, simply
const UChar *string=L"ab\x20ac";
or even
const UChar *string=L"ab€";
-> but this is not portable
d) using a preprocessor which takes source code like
const UChar *string=U16LITERAL("ab\u20ac");
or
const UChar *string=U16LITERAL("ab€");
and generates output C source code like a) or c) as appropriate
-> Are there such preprocessors available?
I guess Perl could do this...
e) using a tool as in d) but only per-string for the developer,
where one can type "ab€" and the tool generates output
text like in a) to copy-paste into the .c file,
possibly with a comment containing the original string
I am *not* looking for ways to get strings via more high-level mechanisms and runtime functions like
z1) not using string literals but resource bundles/message catalogs etc.
z2) using an unescape function
const UChar *string=unescape("ab\\u20ac");
etc.
Tips are greatly appreciated.
markus
This archive was generated by hypermail 2.1.5 : Wed Apr 30 2003 - 19:39:30 EDT