From: Markus Scherer (markus.icu@gmail.com)
Date: Tue Mar 28 2006 - 18:53:21 CST
On 3/28/06, Dean Harding <dean.harding@dload.com.au> wrote:
> It's also (unfortunately) quite popular with a lot of email servers. I don't
> really know why, because UTF-8 + quoted-printable would have been just
> almost as efficient, and you wouldn't need some custom encoder/decoder
> that's almost-but-not-quite Base64 encoding...
Let's count:
For example, a common Chinese character (from the BMP Unihan block)
takes the following number of bytes:
UTF-16: 2
UTF-16+base64: 2.67
UTF-7: 2.67 (plus a little overhead, less for longer runs of non-ASCII chars)
UTF-8: 3
UTF-8+base64: 4
UTF-8+quoted-printable: 9
For Latin (non-ASCII), Greek, Cyrillic, Arabic, Hebrew the numbers are
UTF-16: 2
UTF-16+base64: 2.67
UTF-7: 2.67 (plus a little overhead...)
UTF-8: 2
UTF-8+base64: 2.67
UTF-8+quoted-printable: 6
In other words, for email, if you don't want to trust that the whole
network is 8BIT-safe, UTF-7 is reasonably efficient.
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Tue Mar 28 2006 - 18:54:22 CST