Re: Least used parts of BMP.

From: Doug Ewell (
Date: Wed Jun 02 2010 - 23:43:17 CDT

  • Next message: Kannan Goundan: "Re: Least used parts of BMP."

    Michael D'Errico <mike dash list at pobox dot com> wrote:

    > If you want a really fast alternate encoding, you could encode all of
    > Unicode in at most 3 bytes. Use the high bit as a "continuation" bit
    > and the lower 7 bits as the data.
    > ASCII gets passed through unchanged.

    This is essentially what I was going to suggest to Kannan, since
    avoidance of ASCII bytes, nulls, etc. is not relevant to his use case.
    The conversion is lightning-fast; it can be optimized to be even faster
    than UTF-8.

    Doug Ewell  |  Thornton, Colorado, USA  |
    RFC 5645, 4645, UTN #14  |  ietf-languages @ ­

    This archive was generated by hypermail 2.1.5 : Wed Jun 02 2010 - 23:46:21 CDT