Re: Is there a UTF that allows ISO 8859-1 (latin-1)?

From: Gunther Schadow (gunther@aurora.rg.iupui.edu)
Date: Mon Aug 24 1998 - 15:17:25 EDT


There have been three contributions to this discussion from Chris
Wendt <christw@microsoft.com>, Jungshik Shin <jungshik.shin@yale.edu>
and Glen Perkins <gperkins+uaa@memwizard.com>, that I am a bit
concerned about. First of all they all seem to be based on a
misunderstanding what Dan Oscarsson meant by "true subset" of
Unicode. Dan already cleared this up on his reply to Chris Wendt.

The other issue that I am concerned about is that the undelying
arguments against an ISO Latin-1 compatible UTF seem to be more of
political correctness type of arguments. Everyone, including Dan and
me, is happy about the Unicode and its taking care of all languages in
the world. We are not arguing for a narrowing of the scope of
Unicode. However, there are two observations that I guess everyone
must recognize regagrdless of what script he or she uses natively or
uses most often:

(1) Unicode code values are backwards compatible to 7bit
US-ASCII. E.g., the character value 65 is the latin capital letter A.

(2) In the exact same sense of (1), Unicode is compatible to the ISO
Latin-1 extensions to ASCII, where, e.g., 196 stands for the latin
capital letter A with diaeresis.

(3) UTF-8 and UTF-7 are encodings for Unicode that are most useful for
the majority of languages used in the continents of North-America,
South-America, and Europe. All other languages will probably prefer to
use the 16 bit Unicode integers directly.

(4) UTF-8 and UTF-7 are based on the backward compatibility of Unicode
to US-ASCII (1) but they neglect the backward compatibility of Unicode
to ISO Latin-1.

So, if you are one of those highly respected members of the world
population who prefers writing in Greek, Kyrillic, Chinese, Japanese,
Devanagari, Thai, or Malayalam, I do not ask you to bother with me for
an ISO Latin-1 compatible UTF. But I ask you to think why an
Anglo-Americanocentric UTF is good while a UTF for all scripts based
on Latin is so bad and politically incorrect to call for (BTW:
wouldn't vietnamese be supported by ISO Latin-1 as well?). If the
Khmer script were integrated into Unicode in some backward compatible
manner, like to always strip or add some bits from your code page, I
would certainly support your call for a UTF that facilitates a
graceful transformation of the Terabytes of legacy software used and
produced in Kampuchea. I am open to this.

But may I please ask you (especially the US-residents among the
fighters for political correctness) at least not to interfere with a
call for a UTF that is as compatible as Unicode is by itself? I think
that the issue with UTF-7 and UTF-8 is more about broadening the
narrow Anglo-American view on the world than to narrow the beautiful
global view of Unicode towards an Euro-centrism.

regards
-Gunther Schadow

P.S. We will gladly use the UTF-whatever escape sequence to refer to
the Unicode Euro character if we have to.

Gunther Schadow ----------------------------------- http://aurora.rg.iupui.edu
Regenstrief Institute for Health Care
1001 W 10th Street RG5, Indianapolis IN 46202, Phone: (317) 630 7960
schadow@aurora.rg.iupui.edu ---------------------- #include <usual/disclaimer>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT