Re: Is there a UTF that allows ISO 8859-1 (latin-1)?

From: Dan Kegel (dank@alumni.caltech.edu)
Date: Tue Aug 25 1998 - 10:30:17 EDT


Kevin Bracey wrote:
> Gunther Schadow <gunther@aurora.rg.iupui.edu> wrote:
> > But may I please ask you (especially the US-residents among the
> > fighters for political correctness) at least not to interfere with a
> > call for a UTF that is as compatible as Unicode is by itself? ...
>
> Okay, Gunther, here's my take on this. You want to use Unicode, but you
> have lots of Latin-1 text you want to still be able to use. Your idea
> is to define a new UTF-8 variant that states that illegal sequences
> should be interpreted as Latin-1 bytes. You will then declare all your
> data to be "UTF-8x" or whatever you call it.
>
> At first this may seem like a good idea, but it falls down on a lot
> of counts. ... [deletia] What you really want to do is ... autodetect of
> the encoding of a file as a whole. Autodetection of the format of a data
> stream is fairly easy - make your applications autodetect the format ...
> This will work reasonably well. Far better would be to move to some sort of
> scheme whereby you can tag data externally as Latin-1 or UTF-8 (or any other
> encoding), just as HTML 4.0 does.

Kevin's summary of the problems with Gunther's proposal is great.
Gunther, what you want would be nice, but it's a can of worms
technically. I know you didn't want to change your apps, but you
might have to. Alternately, if the apps of interest are tools
for viewing but not modifying data, you might consider writing
a gateway program that would keep an up-to-date copy of the
Unicode data downconverted to Latin-1 plus some visual encoding of
non-Latin-1 chars, so your legacy apps could at least see some
of the data.

- Dan 'big fat mouth' Kegel



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT