Re: Question about \uxxxx etc. for 21-bit code points - need advice

From: Marco Cimarosti (
Date: Wed May 24 2000 - 10:04:03 EDT

Valeriy E. Ushakov wrote:
> It [a C compiler] knows source charset and execution charset
> and makes appropriate transcoding of string literals.

You are probably right for character literals like 'q' or for symbolic escape sequences like '\n'.

But my assumption is that this should never ever happen with esplicit numeric codes: if I say '\x11' I want my character to be decimal 17, whatever the encoding(s) might be!

I think that my assumption used to be correct up to a certain age. If things have changed, could you point me to some online document about the new stuff?

> You can split the string: "\x2""Two" -> "\x2""Deux". Since \x
> escapes are variable-length - it seems it's a good idea to
> always split strings after an \x escape.

This is good practice and I will adopt it; thanks for the hint.

However, I would have preferred a smarter syntax that did not force programmers' creativity into such tricks.

> Also there *is* a fixed length hex escape in C:
> [...]
> universal-character-name:
> \u hex-quad
> \U hex-quad hex-quad
> [...]

OK for the new \u \U escapes, but this is not true for the old \x escape.

_ Marco
FREE Personalized Email at
Sign up at

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT