Re: The "prohibited" encodings...

From: Doug Ewell (doug@ewellic.org)
Date: Wed Dec 30 2009 - 21:15:00 CST

  • Next message: William J Poser: "Re: Vertical line(s) below"

    Andrew Lipscomb <ewwa at chattanooga dot net> wrote:

    > Except that UTF-32 *isn't* on the banned list that started this
    > thread--discouraged, though, as I understand it. The fourth one was
    > CESU-8 (which, granted, has only one character that can be encoded two
    > ways, the NULL).

    CESU-8 doesn't have any characters that can be encoded two ways. You
    may be thinking of a different encoding.

    CESU-8 is simply UTF-8 applied to UTF-16 code units instead of Unicode
    scalar values. A supplementary character like U+10000 is encoded as <ED
    A0 80 ED B0 80> instead of <F0 90 80 80>. (Note that UTR #26
    incorrectly quotes this as <ED AE 80 ED B0 80>, which is the CESU-8
    encoding for U+F0000, an earlier example.) All BMP characters,
    including U+0000 NULL, are encoded the same in both CESU-8 and UTF-8,
    which of course is the biggest problem with CESU-8.

    --
    Doug Ewell  |  Thornton, Colorado, USA  |  http://www.ewellic.org
    RFC 5645, 4645, UTN #14  |  ietf-languages @ http://is.gd/2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Wed Dec 30 2009 - 21:18:48 CST