Re: UTF-8, ISO C Am.1, and POSIX

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Wed Aug 13 1997 - 03:17:37 EDT


>POSIX doesn't include information about any specific encoding,
>UTF-8 or otherwise. It is designed to work with a variety of
>encodings, so it doesn't make sense for it to include specific
>details of how it might work with a UTF-8-based locale anymore
>than it would make sense for it to include details of how it
>might work with an ISO 8859-1-based locale or a Japanese EUC-based
>locale. Yes, yes, I know UTF-8 and Unicode/UCS are universal
>encodings, but from POSIX's point of view, that's irrelevant.
>They're just encodings.

That's just what's wrong with POSIX from the perspective of an implementer
of the Unicode Standard. Unicode has well defined character semantics that
are considered a property of the character itself and therefore not locale
dependent. A shorthand notation to kick the standard library into supporting
these is indeed called for. In an indirect way, it's analogous to the 'C'
locale, with its minimal guarantees. A "Unicode" locale (or more correctly,
the character type subset of a locale) seems a reasonable extension.

BTW, there is nothing that prevents anybody from supporting the character
semantics discovered and catalogued by Unicode for other character sets (for
the corresponding characters). There have been more than one implementation
of Unicode's bidi-algorithm on top of 8-bit character sets, to give just one
example.

A./



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT