From: Doug Ewell (doug@ewellic.org)
Date: Wed Jun 02 2010 - 23:43:17 CDT
Michael D'Errico <mike dash list at pobox dot com> wrote:
> If you want a really fast alternate encoding, you could encode all of
> Unicode in at most 3 bytes. Use the high bit as a "continuation" bit
> and the lower 7 bits as the data.
>
> ASCII gets passed through unchanged.
This is essentially what I was going to suggest to Kannan, since
avoidance of ASCII bytes, nulls, etc. is not relevant to his use case.
The conversion is lightning-fast; it can be optimized to be even faster
than UTF-8.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s
This archive was generated by hypermail 2.1.5 : Wed Jun 02 2010 - 23:46:21 CDT