Re: UTF-8, ISO C Am.1, and POSIX

From: Keld J|rn Simonsen (keld@dkuug.dk)
Date: Wed Aug 13 1997 - 06:57:31 EDT


Asmus Freytag writes:

(I presume it was Sandra Martin O'Donnell that wrote the first cited words).
> Yes, yes, I know UTF-8 and Unicode/UCS are universal
> >encodings, but from POSIX's point of view, that's irrelevant.
> >They're just encodings.
>
> That's just what's wrong with POSIX from the perspective of an implementer
> of the Unicode Standard. Unicode has well defined character semantics that
> are considered a property of the character itself and therefore not locale
> dependent. A shorthand notation to kick the standard library into supporting
> these is indeed called for. In an indirect way, it's analogous to the 'C'
> locale, with its minimal guarantees. A "Unicode" locale (or more correctly,
> the character type subset of a locale) seems a reasonable extension.

This is worked upon in the forthcoming 14652 standard in ISO.
>
> BTW, there is nothing that prevents anybody from supporting the character
> semantics discovered and catalogued by Unicode for other character sets (for
> the corresponding characters). There have been more than one implementation
> of Unicode's bidi-algorithm on top of 8-bit character sets, to give just one
> example.

That is also the way 14652 does it, it is defined on the repertoire
of 10646 but also aaplies to subrepertoires thereof.

Keld



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT