From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Nov 14 2004 - 17:45:56 CST
Doug Ewell said:
> instead of overloading the string type. Strings are for text. Text
> does not need nulls.
Nulls are legal Unicode characters, also for use in plain text and since
ever in ASCII, and all ISO 8-bit charset standards. Why do you want that a
legal Unicode string containing NULL (U+0000) *characters* become illegal
when converted to C strings?
A null *CHARACTER* is valid in C string, because C does not mandate the
string encoding (which varies according to locale conventions at run-time).
It just assigns a special role to the null *BYTE* as a end-of-string
terminator.
There are many reasons why one would want to store null *characters* in C
strings, using a proper escaping mechanism (a transport syntax like the
transformation of 00 generated by UTF-8, into C080) or an encoding scheme
(UTF-8 does not fit here, one needs another scheme like the Sun modified
version).
And I don't consider this to be "broken" encoding. It's just another
encoding, fully compatible with Unicode *and* with C string conventions.
Using pure UTF-8 in C strings would not be conforming to either Unicode or C
conventions because it will illegitimately restrict the legal embedding of
U+0000 in strings...
This archive was generated by hypermail 2.1.5 : Sun Nov 14 2004 - 17:51:14 CST