Re: UTF-7 - I'm not really smarter

From: Markus Scherer (markus.icu@gmail.com)
Date: Tue Mar 28 2006 - 18:53:21 CST

Next message: Doug Ewell: "Re: UTF-7 - I'm not really smarter"

Previous message: Markus Scherer: "Re: UTF-7"
In reply to: Dean Harding: "RE: UTF-7 - I'm not really smarter"
Next in thread: Keutgen, Walter: "RE: UTF-7 - I'm not really smarter"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 3/28/06, Dean Harding <dean.harding@dload.com.au> wrote:
> It's also (unfortunately) quite popular with a lot of email servers. I don't
> really know why, because UTF-8 + quoted-printable would have been just
> almost as efficient, and you wouldn't need some custom encoder/decoder
> that's almost-but-not-quite Base64 encoding...

Let's count:
For example, a common Chinese character (from the BMP Unihan block)
takes the following number of bytes:
UTF-16: 2
UTF-16+base64: 2.67
UTF-7: 2.67 (plus a little overhead, less for longer runs of non-ASCII chars)
UTF-8: 3
UTF-8+base64: 4
UTF-8+quoted-printable: 9

For Latin (non-ASCII), Greek, Cyrillic, Arabic, Hebrew the numbers are
UTF-16: 2
UTF-16+base64: 2.67
UTF-7: 2.67 (plus a little overhead...)
UTF-8: 2
UTF-8+base64: 2.67
UTF-8+quoted-printable: 6

In other words, for email, if you don't want to trust that the whole
network is 8BIT-safe, UTF-7 is reasonably efficient.

markus

--
Opinions expressed here may not reflect my company's positions unless
otherwise noted.

Next message: Doug Ewell: "Re: UTF-7 - I'm not really smarter"
Previous message: Markus Scherer: "Re: UTF-7"
In reply to: Dean Harding: "RE: UTF-7 - I'm not really smarter"
Next in thread: Keutgen, Walter: "RE: UTF-7 - I'm not really smarter"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Mar 28 2006 - 18:54:22 CST