Re: UTF-8 and locales

From: Jungshik Shin (jshin@pantheon.yale.edu)
Date: Thu Nov 26 1998 - 06:40:55 EST


On Thu, 26 Nov 1998, Markus Kuhn wrote:

> It seems to me there are extremely few people around, who might have
> understood how the interaction between locales and UTF-8 is supposed to
> work (and I have not yet managed to become one of them, in spite of
> serious interest in the matter). Both the ISO C and the POSIX standards
> are unreadable and practically useless here. What would be urgently
> needed is a readable widely available tutorial that teaches programmers
> how to enable software for UTF-8 in a portable way with locales.

> May be another Unicode Technical Report?

  Judging from what I know about Unix locale(which may be lacking and
deficient in some/many respects), I don't think the subject is so
difficult and nor do I regard ISO C and POSIX standards as obscure in
that aspect. As I wrote a few times before, UTF-8 is NOTHING more and
NOTHING less than just ANOTHER multibyte encoding and can be dealt with
exactly the same way as other multibyte encodings(EUC-{JP,CN,TW,KR},
Big5, Shift_JIS, JOHAB, etc) and for that matter single byte encodings
are treated. IMHO, Solaris 2.6/7 and AIX 4(with UTF-8 locale support)
showed a couple of clear exmaples in this regard. It matters little
what's used as "wide character" internally as ISO C leaves it upto
implementation. (UCS-4, UCS-2 or whatever an implementor find
appropriate).

    Jungshik Shin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT