Re: Communicator Unicode

From: Yung-Fong Tang (ftang@netscape.com)
Date: Fri Sep 12 1997 - 18:15:12 EDT


And to a positive side, some specs still need something like UTF-7. One of
the example is the new IMAP4rev1 spec- RFC 2060. It use a MODIFIED version of
UTF7 (it is not UTF7) for the mailbox name instead of UTF-8. Why ? ask the
author (I forget the reason he give me.... )

David Goldsmith wrote:

> Markus Kuhn (mskuhn@cip.informatik.uni-erlangen.de) wrote:
>
> >UTF-8 is clearly the preferred one. UTF-7 is a hack to create a
> >base64-style encoding for Unicode characters that was once intended
> >for e-mail usage. It badly messes up the distinction between the
> >character set and the transport encoding in MIME and should be
> >forgotten quickly. It is of zero relevance for HTTP (which offers
> >binary transparency), and thanks to ESMTP and the security upgrades to
> >practically all sendmail installations all over the world that were
> >necessary in the past 24 months due to published attack software, the
> >7-bit problem of e-mail is also mostly gone today.
> >
> >UTF-7 is clearly depricated. Unicode and ISO 10646 are standards that
> >will continually evolve, and there is very little an implementation
> >can do with the knowledge of the version number except always using
> >the most recent available font. Therefore the UNICODE-1-1-UTF-8
> >identifier has never been a good idea in the first place.
>
> Sigh. I want to clear up a couple of misconceptions here. Of course, I'm
> the original author, so take that into consideration.
>
> UTF-7 was intended to produce a Unicode encoding that would reduce to
> (mostly) ASCII in the limiting case. Quoted-printable encoded UTF-8 has
> the same property, but suffers large expansion for non-Roman text. Like
> quoted-printable UTF-8, UTF-7 was intended to be readable by a recipient
> who didn't support MIME or Unicode (the latter is still quite relevant).
>
> As for the distinction between character set and transport encoding,
> UTF-7 took the form it did after close consultation with the IETF and the
> appropriate ietf-charset people. In fact, it was proposed at one point
> that it be made a content transfer encoding, and that was explicitly
> deprecated by the IETF representatives, as UTF-7 is not general-purpose
> enough. I wouldn't have minded either way. If UTF-7 is too much like a
> transfer encoding, then so are a lot of other charset encodings, like HZ.
>
> Finally, although SMTP agents may have gotten more 8-bit savvy, most mail
> clients I've seen on Macs and Wintel PCs still encode 8 bit content as
> quoted printable or Base64 *all the time*.
>
> I agree that UTF-7 is of marginal relevance these days, but it is not
> deprecated in any formal sense, and is still useful in some situations.
>
> By the way, the version number of Unicode in the charset names was also
> at the insistence of the IETF. It happened at a time when there was still
> deep suspicion of Unicode. The newer registrations are dropping the
> version numbers.
>
> David Goldsmith
> Architect
> International, Text, and Graphics Department
> Apple Computer, Inc.
> goldsmith@apple.com





This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT