RE: UTF-8 in email

From: Murray Sargent (murrays@microsoft.com)
Date: Fri Oct 16 1998 - 21:46:14 EDT


Markus, I love your dream. If the world were just one charset, namely
UTF-8, we'd all be sooooo happy! It would save oodles of programming and
testing time and oodles of frustrated users who see files displayed with
incorrect charsets.

But unfortunately the world is complicated and we have a myriad charsets
used in a myriad squared documents that already exist or are being created.
This is true in the Windows world and I suspect it's equally true in the
various Unix worlds (Plan 9 partly, but not completely exempted, since they
still get files from "out there"). General purpose software has to work in
such worlds, especially since the vast majority of users know nothing about
charsets and nevertheless expect to see files displayed correctly.

OK, I don't like the situation either, but one has to be a realist to work
well. Hence we need a BOM for plain-text files. It's really easy to add
when writing the file and remove when reading it and even if you forget to
remove it, its presence is benign. Pls remember: in your own world you can
do really cool things, since you have a lot of control over what's on your
machine. But in a general environment, there are many things totally beyond
your control. Good standards help in reducing the confusion. The BOM for
plain-text files has turned out to be one of those standards, ironic as it
may seem.

Thanks
Murray

        -----Original Message-----
        From: Markus Kuhn [SMTP:Markus.Kuhn@cl.cam.ac.uk]
        Sent: Friday, October 16, 1998 4:08 PM
        To: Unicode List
        Subject: Re: UTF-8 in email

        Murray Sargent wrote on 1998-10-16 20:55 UTC:
> You can still
> create plain-text UTF-8 files without the leading BOM. But they
might not
> get read correctly by the software out there...

        I hope that people who write the "software out there" will also
provide
        other means to signal that the character set if UTF-8, at least
under
        Unix.

        I hope to be able in the near future to use UTF-8 on my Linux box as
my
        *only* character encoding, just like Plan9 has done it successfully
for
        years. All my ISO 8859-1 files will be converted one day to UTF-8
and
        then ISO 8859-1 will not be used any more in plain text files and
file
        names. Incoming text files (mail, web, etc.) will be autoconverted
from
        whatever to UTF-8 or will manually go through recode.

        I hope that I can bring all software that I use to be UTF-8 aware by
        setting an environment variable (e.g., LC_CTYPE=UTF-8) or something
like
        that, and I hope that then I do *not* have to prefix every single
        plain-text file under Unix with a BOM (horrible idea ...).

        Markus

        --
        Markus G. Kuhn, Security Group, Computer Lab, Cambridge University,
UK
        email: mkuhn at acm.org, home page:
<http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT