John Cowan wrote:
(C Type 'char'...)
> Must have at least 8 bits.
Right.
The standard header <limits.h> (http://www.dinkum.com/htm_cl/limits.html)
defines a constant 'CHAR_BIT' that evaluates to the number of bits in a
'char'.
8 is the smallest value accepted by the standard (although, in practice, I
would be very surprised if any compiler in the world has a value different
than 8).
> sizeof(char) is guaranteed to be 1.
Right, by definition.
>Chars may be signed or unsigned, so
>the portable range is 0 to 127.
>Unsigned chars have a portable range
>of 0 to 255, fortunately.
To be really super portable, one should use the symbols in <limits.h>:
- 'signed char' is between 'SCHAR_MIN' and 'SCHAR_MAX';
- 'unsigned char' is between 0 and 'UCHAR_MAX';
- 'char' is identical to either 'signed char' or 'unsigned char', and ranges
between 'CHAR_MIN' and 'CHAR_MAX'.
Similarly, the standard header <wchar.h>
(http://www.dinkum.com/htm_cl/wchar.html) defines that:
- 'wchar_t' is between 'WCHAR_MIN' (that must be <= 'CHAR_MIN') and
'WCHAR_MAX' (that must be >= 'CHAR_MAX').
> > - "Multibyte string": [...]
> Terminated by a '\0'.
> > - "Wide string": [...]
> Terminated by a L'\0'.
You are probably right. In this case, I must change my definition of
"Multibyte character", because it does not require a null terminator:
Old:
> - "Multibyte character": a multibyte string containing only one character
> (in i18n terms), composed by one or more bytes.
New:
- "Multibyte character": an array of type 'char' (e.g. 'char mbchr
[MB_LEN_MAX] = { 0xC2, 0xB1 };') containing only one character (in i18n
terms), composed by one or more bytes.
The null-terminator is not required because all multibyte schemes have a way
to determine how many "trail bytes" follow a "lead byte". In the UTF-8
example above, the leftmost bits in the 0xC2 "lead byte" determine that only
one follows, while the leftmost bits in 0xB1 confirm that this value is a
valid "trail byte".
Finally, as this is an "off topic" posting, I'd like to attempt and
demonstrate that C language is so much "general purpose" than anything can
be written in it, including humor.
Find attached an *ASCII* implementation of ANSI C wide and multibyte
characters. I have submitted it to the ASCII Consortium
(http://www.ecs.soton.ac.uk/~rwb197/ascii), but it hasn't yet appeared on
their web site (the ASCII Editorial Committee is probably still balloting
about this :-).
Ciao.
Marco
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT