Re: unicode conversion in any amtp server (eg sendmail)

From: Jungshik Shin (jshin@mailaps.org)
Date: Mon Apr 22 2002 - 17:49:07 EDT


On Mon, 22 Apr 2002, David Starner wrote:

> On Mon, Apr 22, 2002 at 08:06:51PM +0100, x0638890 wrote:
> > Hello,
> >
> > Can anyone tell me if there is any smtp server (eg sendmail) which can do
> > automatic unicode to Windows 1252 codepage conversion of incoming emails ?
>
> Probably not. Why would you want to do this? It's working at the wrong
> level; the SMTP server should pass the message through, and the mail
> client should do the conversion.

 I agree. If you really need to do it, one can do that conversion at MDA
(Mail Delivery Agent) level. Something like the following can be put
in procmailrc if procmail(http://www.procmail.org) is used as an MDA on
Unix(-like) hosts and you have 'iconv' (other encoding converters like
'uniconv' that comes with Yudit, one built upon ICU and native2ascii in
JDK can be used).

:0
* ^Content-Type: text/(plain|html); .*charset=.?utf-8
{
  :0 fbw
  |iconv -c -f UTF-8 -t Windows-1252

  :0 fhw
    * ^Content-Type: text/plain
    |formail -c -i "Content-Type: text/plain; charset=Windows-1252"

  :0 Efhw
    * ^Content-Type: text/html
    |formail -c -i "Content-Type: text/html; charset=Windows-1252"
}

You also have to modify RFC 2047 encoded headers (decode RFC 2047
encdoed header fields, convert them to WIndows-1252 and then encode
them back per RFC 2047. The last step is not so trivial.). Moreover, the
proliferation of multipart/* messages makes this much more complicated
than before. There's a procmail recipe to filter out all redundant
parts but I found it's still far from perfect and I had to make a lot
of customization.

In short, it's much better to leave this task to MUAs than
messing incoming messages with MDAs (let alone MTAs). If your MUA is not
MIME or UTF-8 savvy (e.g. Windows Eudora), you'd better upgrade/switch
your MUAs.

 Jungshik Shin



This archive was generated by hypermail 2.1.2 : Mon Apr 22 2002 - 18:40:22 EDT