] From unicode@Unicode.ORG Wed Dec 18 16:12:43 1996
] Reply-To: Christopher.Vance@adfa.oz.au
] 
] |  The ANSI C has not suggested to use wchar_t.  For Unicode UCS-2, we should use 
] |  a user-defined type.  IBM ULS uses the unichar as the 16-bit unsigned short 
] |  for UCS-2.  That should be the approach. 
] 
] I'll have to take your word on whether ANSI has been adopting changes 
] made to the real C standard as they're made, but I do believe that 
] ANSI abandoned its separate C standard in favour of the ISO edition, 
] when it was adopted.  ISO C most definitely does include wchar_t, and 
] has done for a while, even if some national standards haven't caught 
] up.
] 
] If you're using non-standard or obsolete compilers, I can't help you. 
]  Standard headers include one for wchar_t (I think it's <wchar.h>, 
] but my copy is buried somewhere).
] 
] Then again, there's no guarantee which wide character set is used for 
] wchar_t.  Perhaps this is a locale issue?
] 
] |  wchar_t is not intended to be a published data type as char or byte. 
] 
] Excuse me?  Says who?  Since when?  Citation, please.
] 
] -- Christopher
] 
Hello,
There is no standard API or data type defined and/or explicitly specified for
Unicode/UCS-2 unfortunately in terms of standard. I think this is one big
issue (and the reason why multi-platform software development cannot be 
a easy task) that system vendors should try to solve by come up with
a single specification hopefully as like IBM once proposed with their ULS.
The wchar_t is an opaque data type (some says semi-opaque) that you
shouldn't assume on its representation. You can find various
UNIX spec and std sources saying that it is an opaque data type, for
instance, XPG4/4.2/5 that subsumes POSIX.1, ANSI C/ISO C, SVID3 and System V
ABI, ... The only things that you can be sure (if the OS you are dealing with
is XPG4-complient) with the wchar_t type is, there will 0 and PCS characters
(with same value that you can assign) exist in the wchar_t in terms of
code values of the type.
I know this not going to help but... for your information, in Solaris 2.6,
we are also going to provide two UTF-8 locales, en_US.UTF-8 and ko.UTF-8,
and since sizeof(wchar_t) == 4 in SunOS, we chose to support UCS-4 in Sun's
wchar_t. However, again, this is a (semi-)opaque data type that vendors can
choose/change the internal representation...
Ienup
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT