More about UTF-8S: don't multiply UTFs

From: Juliusz Chroboczek (jec@dcs.ed.ac.uk)
Date: Thu Jun 14 2001 - 04:09:20 EDT


Dear all,

In the discussion about UTF-8S, there is one point that has not been
mentioned (or else I missed it).

Most people seem to be arguing from the point of view of users and
developers on platforms on which Unicode is well-established as the
default encoding. On Unix-like systems, however, ISO 2022-based
encodings are still alive and kicking. Hard.

One of the main arguments in favour of using Unicode on such platforms
is that it leads to a world in which there is only one encoding, both
for the user and the developer. The multiplication of UTFs, however,
not only breaks this model, but also leads to much confusion. (Heck,
many users still think that UTF-8 and Unicode are two completely
unrelated encodings! Try explaining to them that UTF-16 is Unicode
too!)

I have tried to point this out when IANA were introducing UTF-16-BE
and other monstruosities, only to be treated in a rather patronising
manner by some of the respectable members of this list (``Juliusz's
confusion can be explained by...''). Folks, from a user's perspec-
tive, UTF-8 and UTF-16 are two different encodings. Please don't make
the situation worse than it already is. Don't create any more UTFs.

Whatever happens, we will continue to promote signature-less UTF-8 as
the only user-visible encoding, and signature-less UTF-8 (mb) and
BOM-less UCS-4 (wc) as the only programmer-visible ones. The more
UTFs the Unicode consortium legitimises, the more explaining we'll
have to do that ``this is just a version of Unicode used on some other
platforms, please convert it to UTF-8 before use.''

Regards,

                                        Juliusz Chroboczek



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT