Re: Communicator Unicode

From: Adrian Havill (havill@threeweb.ad.jp)
Date: Mon Sep 15 1997 - 21:15:52 EDT


I wrote:
> 4) The UTF-8 RFC says "UTF-8" is the "charset" for the MIME, but I see
> "UNICODE-1-1-UTF-7" and "UNICODE-1-1-UTF-8" all over the net for the
> MIME type. Are they both ok? Which is preferred? Which is depreciated?

After getting in this morning from the three day holiday (which I took
from my machine as well) and noticed the barrage of posts relating to
the "all over the net" quote I have to sheepishly admit that I was
(unintentionally) exaggerating--bad choice of words. The Unicode samples
I've taken come from two mailing lists and three newsgroups... one from
Handai (Osaka Univ's I18N Soft Eng project), a private ML from Kubota
systems, and newsgroups internal to Omron systems. Furthermore, all the
cases of "UNICODE-1-1-UTF-8" are due to the same alphaware EMAC mail
extension... apparently a bug.

The language I should have used was "all over -my- net." Apologies for
the mis-info.

Markus Kuhn wrote:
> [UTF-7] is of zero relevance for HTTP (which offers
> binary transparency), and thanks to ESMTP and the security upgrades to
> practically all sendmail installations all over the world that were
> necessary in the past 24 months due to published attack software, the
> 7-bit problem of e-mail is also mostly gone today.

Um, many Japanese ISPs using sendmail still intentionally strip the 8-th
bit in e-mail to force their users to conform to standards (there are
many so-called Japanese e-mail programs that will allow one to send and
receive Shift-JIS and EUC-JP, which causes great grief for ISPs when
they have to explain to customers why their Japanese e-mail gets sent
correctly to x & y but not z). By stripping the 8-th bit, it forces
their customers to use ISO-2022-JP, which guarantees that all Japanese
e-mail/news software will be able to use it. If all so-called Japanese
software worked properly, this wouldn't be a problem, and ISPs wouldn't
have to "censor" the eighth bit. But most don't do this.

> UTF-7 is clearly depricated.

Says who? Popular rule? I am fully aware of the drawbacks of UTF-7 (and
the benefits), but I have yet to see any word about UTF-7 being
officially depreciated.

-- 
Adrian Havill <URL:http://www.threeweb.ad.jp/>
Engineering Division, System Planning & Production Section



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT