Re: Communicator Unicode

From: Markus Kuhn (mskuhn@cip.informatik.uni-erlangen.de)
Date: Fri Sep 12 1997 - 03:00:36 EDT


Adrian Havill wrote:
> 4) The UTF-8 RFC says "UTF-8" is the "charset" for the MIME, but I see
> "UNICODE-1-1-UTF-7" and "UNICODE-1-1-UTF-8" all over the net for the
> MIME type. Are they both ok? Which is preferred? Which is depreciated?

UTF-8 is clearly the preferred one. UTF-7 is a hack to create a
base64-style encoding for Unicode characters that was once intended
for e-mail usage. It badly messes up the distinction between the
character set and the transport encoding in MIME and should be
forgotten quickly. It is of zero relevance for HTTP (which offers
binary transparency), and thanks to ESMTP and the security upgrades to
practically all sendmail installations all over the world that were
necessary in the past 24 months due to published attack software, the
7-bit problem of e-mail is also mostly gone today.

UTF-7 is clearly depricated. Unicode and ISO 10646 are standards that
will continually evolve, and there is very little an implementation
can do with the knowledge of the version number except always using
the most recent available font. Therefore the UNICODE-1-1-UTF-8
identifier has never been a good idea in the first place.

Use UTF-8 and only UTF-8 if you send out data. If you are concernened
about 8-bit e-mail transparency, use quoted-printable or better rely
on ESMTP to take care of this one in the MTA layer for you. Of course,
you should accept in your software whatever seems to have been
specified in the past for your user's convenience.

Markus

-- 
Dipl.-Inf. Markus Kuhn, Schlehenweg 9, D-91080 Uttenreuth, Germany
mkuhn at acm.org, http://wwwcip.informatik.uni-erlangen.de/~mskuhn



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT