Re: 32'nd bit & UTF-8

From: Clark Cox ([email protected])
Date: Fri Jan 21 2005 - 08:27:16 CST

Next message: Arcane Jill: "So how about U+D7FD for a NOP then?"

Previous message: Arcane Jill: "Re: 32'nd bit & UTF-8"
In reply to: Arcane Jill: "Re: 32'nd bit & UTF-8"
Next in thread: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Reply: Antoine Leca: "__STDC_ISO_10646__ [Was: 32'nd bit & UTF-8]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Fri, 21 Jan 2005 08:42:51 -0000, Arcane Jill <[email protected]> wrote:
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]On
> Behalf Of Hans Aberg
> Sent: 20 January 2005 20:47
> To: Antoine Leca; [email protected]
> Subject: Re: 32'nd bit & UTF-8
>
> > That already seems to have happened with GNU GCC, which fixes wchar_t to
> > 32-bits.
>
> and Microsoft Wisual C++, which fixes wchar_t to SIXTEEN bits.
>
> The existence of wchar_t does not imply UTF-32. It does imply UTF-16. It does
> not even imply Unicode. It's just a type.

But, if __STDC_ISO_10646__ is defined, then it does imply that wchar_t
can represent all of the Unicode/ISO-10646 characters. From the C
standard:

"__STDC_ISO_10646__ An integer constant of the form yyyymmL (for
example, 199712L). If this symbol is defined, then every character in
the "Unicode required set", when stored in an object of type
wchar_t, has the same value as the short identifier of that
character. The "Unicode required set" consists of all the characters
that are defined by ISO/IEC 10646, along with all amendments and
technical corrigenda, as of the specified year and month."

In addition, it seems that there is no way that a conforming C
implementation can use wchar_t to represent UTF-16. If
__STDC_ISO_10646__ is less than 200111, then UTF-16 didn't exist at
the time, so wchar_t must be UCS-2 in that case, and if
__STDC_ISO_10646__ is greater than or equal to 200111, then a single
16-bit wchar_t is not large enough to contain a representation of any
given character "defined by ISO/IEC 10646".

-- 
Clark S. Cox III
[email protected]
http://www.livejournal.com/users/clarkcox3/
http://homepage.mac.com/clarkcox3/

Next message: Arcane Jill: "So how about U+D7FD for a NOP then?"
Previous message: Arcane Jill: "Re: 32'nd bit & UTF-8"
In reply to: Arcane Jill: "Re: 32'nd bit & UTF-8"
Next in thread: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Reply: Antoine Leca: "__STDC_ISO_10646__ [Was: 32'nd bit & UTF-8]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 08:30:28 CST