Re: Hypersurrogates

From: Doug Ewell (doug@ewellic.org)
Date: Sat Aug 29 2009 - 00:33:42 CDT

  • Next message: William_J_G Overington: "Encoding of Logos, Personal Gaiji et cetera for electronic library archiving (formerly Re: Hypersurrogates)"

    Benjamin M Scarborough <benjamin dot scarborough at student dot utdallas
    dot edu> wrote:

    > †If my understanding of UTF-8 is correct, U+7FFFFFFF would be FD BF BF
    > BF BF BF and U+FFFFFFFFFFFFFFFF would be FF BE 8F BF BF BF BF BF BF BF
    > BF BF BF. 13 bytes, isn't that fun?

    Nope. Even in the original definition, UTF-8 was limited to 31-bit
    scalar values represented in 6 bytes, and FE and FF were explicitly
    guaranteed never to appear in a valid UTF-8 stream.

    --
    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
    http://www.ewellic.org
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ
    


    This archive was generated by hypermail 2.1.5 : Sat Aug 29 2009 - 00:36:37 CDT