From: Rick Cameron (Rick.Cameron@businessobjects.com)
Date: Thu Aug 12 2004 - 12:36:48 CDT
Hi, Markus
Hardly misleading! You can, of course, view UTF-16 data in memory as an
array of 16-bit code units. But you can also view it as an array of bytes.
This might not be a good idea, but it is necessary occasionally.
When a UTF-16 string is treated as an array of bytes, it's supremely
important to know the byte order. The OP asked about byte order, and seemed
to me to be referring to data in memory. Hence my answer.
Cheers
- rick
-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
Behalf Of Markus Scherer
Sent: August 12, 2004 9:19
To: unicode
Subject: Re: Wide Characters in Windows and UTF16
Rick Cameron wrote:
> Microsoft Windows uses little-endian byte order on all platforms.
> Thus, on Windows UTF-16 code units are stored in little-endian byte order
in memory.
>
> I believe that some linux systems are big-endian and some
> little-endian. I think linux follows the standard byte order of the
> CPU. Presumably UTF-16 would be big-endian or little-endian accordingly.
This is somewhat misleading. For internal processing, where we are talking
about the UTF-16 encoding form (quite different from the external encoding
_scheme_ of the same name), we don't have strings of bytes but strings of
16-bit units (WCHAR in Windows). Program code operating on such strings
could not care less what endianness the CPU uses. Endianness is only an
issue when the text gets byte-serialized, as is done for the external
encoding schemes (and usually by a conversion service).
markus
This archive was generated by hypermail 2.1.5 : Thu Aug 12 2004 - 12:37:45 CDT