From: Arcane Jill (arcanejill@ramonsky.com)
Date: Fri Jan 21 2005 - 07:32:38 CST
-----Original Message-----
From: Philippe Verdy [mailto:vpi92@yahoo.fr]
Sent: 21 January 2005 13:06
To: Arcane Jill
Cc: unicode@unicode.org
Subject: Re: 32'nd bit & UTF-8
>Arcane Jill <arcanejill@ramonsky.com> a écrit :
>> The existence of wchar_t does not imply UTF-32. It does imply UTF-16.
That was a typo of course. It should have read "It does NOT imply UTF-16".
> I like this definition. but what is interesting here are the phrases
> "character set" and "supported by the compilation environment".
>
> "character set": the definition implies that this is necessarily a
> *coded* character set, because it makes an equation between what it
> calls a "character" and a "integer character constant". Unfortunately,
> the definition of "character" is weak. It does not have the same
> meaning as the "abstract character" defined in Unicode/ISO/IEC, so it
> could map to Unicode's "code units". This would make UTF-16 suitable.
>
> But if needs to match with "abstract characters", then there's no
> choice for a C++ compiler: the integer datatype representing "wchar_t"
> must be able to contain at least as many distinct values as the ISO/IEC
> 10646 repertoire, and must contain the value 0.
Well, wchar_t on Windows is 16-bits wide, and hence /not/ able to contain as
many distinct values as the ISO/IEC 10646 repertoire. Gotta be code units then.
> The definition also does not say that the value 0 will necessarily be
> the same as a NULL character (U+0000). This depends on the "supported
> character set" in compile-time locales. There may as well exist a
> supported encoded charset that maps U+0000 to the integer value -2
> (because there's no requirement that integer values match ISO/IEC 10646
> codepoints). The definition relates only to the "null character" i.e.
> the one that "\0" maps to in string or character constants, but makes
> no assumption about if this null matches the ISO10646 NULL (U+0000)
> character.
It is fortunate, then, that C was never implemented on the ZX80 or ZX81, for
which '\0' would have been the SPACE character (U+0020). (See
http://web.ukonline.co.uk/sinclair.zx81/appxa.html). On the ZX80/81, every
space would have terminated a string!
Fun, eh?
Jill
This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 07:40:05 CST