RE: Multibyte definition

From: Marco.Cimarosti@icl.com
Date: Fri Mar 17 2000 - 05:13:42 EST

Next message: Marco.Cimarosti@icl.com: "RE: Unicode to UTF-8"
Previous message: Peter Constable: "Re: Unicode to UTF-8"
Maybe in reply to: Jeff Moles: "Multibyte definition"
Next in thread: Brendan Murray/DUB/Lotus: "Re: Multibyte definition"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> You can, of course, put whatever you want into a wchar_t but,
> by convention, it tends to be restricted to UCS-2/UTF-16. If
> some application is using these types for something else, I'd
> be very suspicious indeed.

I see this as a gratuitous assumption.

C type 'wchar_t' is not for Unicode specifically. I don't remember having
seen the term "Unicode" on the ANSI C documentation I have seen, and I would
be surprised if the C++ is any different.

In C terms:

- "Byte": (1) the unit of measure for memory, as returned by operator
'sizeof'. Nothing more is implied, although 8 bits is a common size.

- "Type 'char'": an integer whose site is one "byte" (in C terms). Among
other things, it is guaranteed that its size is <= to the size of type
'wchar_t' ('sizeof(char) <= sizeof(wchar_t)' is always true; 'sizeof(char) <
sizeof(wchar_t)' is *not* always true).

- "Multibyte character": a multibyte string containing only one character
(in i18n terms), composed by one or more bytes.

- "Multibyte string": an array of type 'char' (e.g. 'char mbstr [10] =
"Ciao!"'). Nothing else is implied; the term "multibyte" is only a reminder
for the fact that array elements and characters don't necessarily have a
one-to-one correspondence.

- "Type 'wchar_t'": a type defined (among other places) in header "wchar.h".
Notice the difference with C++, where 'wchar_t' is a built-in type, not
defined anywhere. Type 'wchar_t' is guaranteed not to be smaller that type
'char'; no other assumptions are made about its size (although 16 and 32
bits are very common sizes).

- "Wide character": a value of type 'wchar_t' (e.g. 'wchar_t wchr = L'C').

- "Wide string": an array of type 'wchar_t' (e.g. 'wchar_t wstr [10] =
L"Ciao!"'). Nothing else is implied.

_ Marco

Next message: Marco.Cimarosti@icl.com: "RE: Unicode to UTF-8"
Previous message: Peter Constable: "Re: Unicode to UTF-8"
Maybe in reply to: Jeff Moles: "Multibyte definition"
Next in thread: Brendan Murray/DUB/Lotus: "Re: Multibyte definition"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT