Re: UTF-7 - I'm not really smarter

From: Markus Scherer (markus.icu@gmail.com)
Date: Tue Mar 28 2006 - 18:53:21 CST

  • Next message: Doug Ewell: "Re: UTF-7 - I'm not really smarter"

    On 3/28/06, Dean Harding <dean.harding@dload.com.au> wrote:
    > It's also (unfortunately) quite popular with a lot of email servers. I don't
    > really know why, because UTF-8 + quoted-printable would have been just
    > almost as efficient, and you wouldn't need some custom encoder/decoder
    > that's almost-but-not-quite Base64 encoding...

    Let's count:
    For example, a common Chinese character (from the BMP Unihan block)
    takes the following number of bytes:
    UTF-16: 2
    UTF-16+base64: 2.67
    UTF-7: 2.67 (plus a little overhead, less for longer runs of non-ASCII chars)
    UTF-8: 3
    UTF-8+base64: 4
    UTF-8+quoted-printable: 9

    For Latin (non-ASCII), Greek, Cyrillic, Arabic, Hebrew the numbers are
    UTF-16: 2
    UTF-16+base64: 2.67
    UTF-7: 2.67 (plus a little overhead...)
    UTF-8: 2
    UTF-8+base64: 2.67
    UTF-8+quoted-printable: 6

    In other words, for email, if you don't want to trust that the whole
    network is 8BIT-safe, UTF-7 is reasonably efficient.

    markus

    --
    Opinions expressed here may not reflect my company's positions unless
    otherwise noted.
    


    This archive was generated by hypermail 2.1.5 : Tue Mar 28 2006 - 18:54:22 CST