Re: Please help: Unicode sig in Hotmail

From: jshin@mailaps.org
Date: Fri Apr 12 2002 - 16:59:38 EDT


On Fri, 12 Apr 2002, Stefan Persson wrote:

> From: <jshin@mailaps.org>

> > Hotmail and most other webmail services used to have a lot of things to
> > be desired in terms of I18N and MIME standard compliance. However,
> > recently they got much better and I'm almost sure there's a way to
                                          ^^^^^^^^^^^^
> > specify that you want to encode your outgoing emails in Unicode(UTF-8).

> Hotmail and Yahoo do *not* support UTF-8 in any way.

  Yeah, I was wrong in assuming that in 'this age of Unicode',
hotmail, a subsidiary of the company which spearheaded Unicode development
among others by adoptng Unicode in their OS and products, is likely to
support UTF-8.

  Anyway, UTF-8 support is not all there is for I18N and MIME compliance.
They're not perfect, but they're much better than 5 years ago when
they didn't have any inkling of I18N and MIME (RFC 2045-2049, 2184,
2231, etc). Back then, everything was considered as either US-ASCII or
ISO-8859-1 by virtually all web mail services (I should have launched
a web mail service, then :-) ). Still, there are a lot of things to be
improved. Yahoo-mail now knows about RFC 2047 encoding of email headers,
but it 'overdoes' it by encoding everything thrown at it even when
there's no need for it. Often it produces something silly like this:

Subject: =?EUC-KR?Q?=41=42=43=44=20=41=42?=

given a string 'ABCD AB'.

In another message, Ben Monroe wrote

> As far as I know, Hotmail, when sent from the web interface at
> www.hotmail.com , skips the charset= line in the English interface. Most
> of the e-mail I read and write are in Japanese, so I had some problems
> at one time. I tried various things and noticed that if you change the
> interface language (Options --> Language), Hotmail inserts a charset=
> line common with that language.

> The problem is that Hotmail is leaving off the charset= line in English
> mode and does not provide a convenient method to set it to what you wish.

  Tying the interface language to the encoding/MIME charset appears to
be one of common mistakes made by Web applications. Google used to do
that (or still does that.). They have little idea the interface language
and encoding/MIME charset choice are, more often than not , orthogonal to
and independent of each other as your case demonstrated. A lot of people
just stick to English interface because in many cases translation into
their native language is so poor (and even cryptic) that it's rather hard
to understand. As everyone on this list knows, this doesn't mean that
they don't want to exchange data and emails in their native languages
(or other foreign languages) that requires encoding other than US-ASCII.

  A similar mistake is made by some Unix programs which uses
the vaule of LC_MESSAGES locale category instead of LC_CTYPE to determine
encoding/charset to use. (I usually have LC_CTYPE set to ko_KR.UTF-8
with LC_MESSAGES to C/POSIX or en_US.UTF-8).

> Perhaps one of the other interface languages always uses UTF-8? Perhaps
> you can try this out, though I doubt it.

  How about languages/scripts for which there's no 'legacy encoding'
widely used and supported and Unicode is the first widely used character
set? Ethiopic comes to my mind, but I doubt Hotmail is localized
in Ethiopic. Even if it is, you need to know Ethiopic :-)

  Jungshik Shin



This archive was generated by hypermail 2.1.2 : Fri Apr 12 2002 - 15:56:41 EDT