> We have in the ISO POSIX WG been thru all POSIX standards to see
> what changes we should do to the standards to accompdate UCS.
Markus Kuhn wrote:
I guess, pretty much the only thing required in the POSIX standard for UTF-8
is a standardized way to tell the locale mechanism that the character encoding
used is UTF-8. UTF-8 is a little bit more than yet another character
table, so there should be some locale flag or something like this that
allows me to tell libc that UTF-8 is the used encoding.
The original question was what changes, if any, are needed in
POSIX to accommodate UCS. There aren't any that I can think of,
if we assume an implementation is using UTF-8 as the multibyte
external code and UCS as an internal wide character format.
Given that, there's no reason POSIX needs a flag or anything
else to make it aware it's using UTF-8. POSIX is designed to
be code set independent.
. . .
What's the state of the standardization with regard to specifying in a
locale that we use UTF-8? How does enUS.UTF-8 look like?
Different from what most other implementations are using. Using
the values in your example, most would write this as en_US.UTF-8.
It might also be useful, if POSIX would clairfy, how all the new
ISO C Am. 1 functions for wide streams and multi-byte strings work in
detail if we have selected the UTF-8 encoding in the locale. . .
POSIX doesn't include information about any specific encoding,
UTF-8 or otherwise. It is designed to work with a variety of
encodings, so it doesn't make sense for it to include specific
details of how it might work with a UTF-8-based locale anymore
than it would make sense for it to include details of how it
might work with an ISO 8859-1-based locale or a Japanese EUC-based
locale. Yes, yes, I know UTF-8 and Unicode/UCS are universal
encodings, but from POSIX's point of view, that's irrelevant.
They're just encodings.
Sandra Martin O'Donnell
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT