C string literals with 16-bit Unicode

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Apr 30 2003 - 18:43:20 EDT

Next message: Christopher John Fynn: "Re: Private Use Area"

Previous message: John Hudson: "Re: Accented IJ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi all, I am wondering how developers get 16-bit string *literals* into C source code. Do you use a
mechanism other than the following?

In the following, I use UChar as an example typedef name for the type of 16-bit Unicode strings
(usually same as unsigned short).

Escapes for non-ASCII characters would be ok. UTF-8/16 for the source code would be nicer. Whatever
mechanism has to work on a non-ASCII platform, too.

I am aware that there is an effort under way to add 16-bit Unicode string literals to the C
standard; I am looking for what can be done today.

I know of

a) array of numeric constants
const UChar string[]={ 0x61, 0x62, 0x20ac };

b) array of numeric constants expressed as named constants
enum { _a=0x61, _b, _c, ..., _Euro=0x20ac, ... };
const UChar string[]={ _a, _b, _Euro };

c) on some lucky platforms with 16-bit-Unicode wchar_t, simply
     const UChar *string=L"ab\x20ac";
   or even
     const UChar *string=L"ab€";

-> but this is not portable

d) using a preprocessor which takes source code like
     const UChar *string=U16LITERAL("ab\u20ac");
   or
     const UChar *string=U16LITERAL("ab€");
   and generates output C source code like a) or c) as appropriate

-> Are there such preprocessors available?
I guess Perl could do this...

e) using a tool as in d) but only per-string for the developer,
    where one can type "ab€" and the tool generates output
    text like in a) to copy-paste into the .c file,
    possibly with a comment containing the original string

I am *not* looking for ways to get strings via more high-level mechanisms and runtime functions like

z1) not using string literals but resource bundles/message catalogs etc.

z2) using an unescape function
const UChar *string=unescape("ab\\u20ac");

etc.

Tips are greatly appreciated.

markus

Next message: Christopher John Fynn: "Re: Private Use Area"
Previous message: John Hudson: "Re: Accented IJ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Apr 30 2003 - 19:39:30 EDT