From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Fri Sep 22 2006 - 19:08:25 CDT
Kenneth Whistler wrote on Friday, September 22, 2006 11:09 PM
>> Unsigned int is only guaranteed a range of 0 to 0xffff and
>> therefore it can't normalise the string <U+FAD5> - the normalised form is
>> <U+25249> in all four normalisations.
>
> It *can*, if you abstract your type definitions correctly.
>> Of course, unsigned int is good
>> enough to hold UTF-16 code *units*, which might just be what Mike meant.
>> (I.e., the type supports UTF-16, but not UTF-32.)
> It is perfectly fine for UTF-32, if you do this correctly.
I.e. avoid compilers where plain int is only 16 bits. They're certainly
valid under the 1990 standard.
> ...
> At that point, you can safely port your entire code to *any*
> platform, with at most one compiler-specific #ifdef in your
> fundamental header file.
That is true, but you would no longer necessarily be using 'unsigned int'
for UTF-32.
You could use somthing like:
#include <limits.h>
#if UINT_MAX >= 0X10FFFF
typedef unsigned int utf32char;
#else
typedef unsigned long int utf32char;
#endif
Or did you count this as compiler-specific?
> And if you need to use arbitrary
> buffers of Unicode character data, including embedded NULLs
> and noncharacters, then you are better off using separate tracking
> of buffer length, anyway.
And you need to be able to include embedded nulls to pass some of the
Unicode conformance tests. I know, because that tripped me up in the past.
Steve Summit wrote on Saturday, September 23, 2006 12:00 AM
> Ken Whistler wrote:
>> It is perfectly fine for UTF-32, if you do this correctly.
>> For example:
>>
>> typedef unsigned short UShort16;
>> typedef unsigned int UInt32;
>>
>> typedef UShort16 utf16char;
>> typedef UInt32 utf32char;
>
> Please don't do this! Please do
>
> #include <stdint.h>
>
> typedef uint16_t utf16char;
> typedef uint32_t utf32char;
>
> instead.
>
>> At that point, you can safely port your entire code to *any*
>> platform, with at most one compiler-specific #ifdef in your
>> fundamental header file.
It will only work if your compiler acknowledges the C99 standard. The ones
I use don't claim to comply, and the one I use at home would simply fail to
compile the above.
Richard.
This archive was generated by hypermail 2.1.5 : Fri Sep 22 2006 - 19:12:28 CDT