RE: Invalid code points

From: Phillips, Addison (addison@amazon.com)
Date: Mon Jun 01 2009 - 09:56:13 CDT

  • Next message: Mark Crispin: "RE: Invalid code points"

    Uh... the IETF does not define UTF-8. The Unicode Consortium does. But even if you want to build on the IETF documents, RFC 3629 was published six years ago. Basing a new implementation on something published 11 years ago and obsolete the last six years? Not a good idea.

    One problem with using a non-conformant UTF-8-like encoding to transmit non-textual data is that other processes probably are sensitive to the proper formulation of UTF-8. Such a process might reinterpret the data using another character encoding or otherwise stop processing it as an error (or security risk).

    Besides, there are many fine transfer encoding syntaxes available for transmitting binary data that don't involve pretending that the data is text.

    Addison Phillips
    Globalization Architect -- Lab126

    Internationalization is not a feature.
    It is an architecture.

    > -----Original Message-----
    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
    > On Behalf Of Hans Aberg
    > Sent: Monday, June 01, 2009 12:22 AM
    > To: Doug Ewell
    > Cc: Unicode Mailing List
    > Subject: Re: Invalid code points
    >
    > On 1 Jun 2009, at 00:25, Doug Ewell wrote:
    >
    > >> I think also strictly speaking there are two UTF-8s: one which
    > does
    > >> not have the integer limitations that are used in Unicode. This
    > >> could be used to convert integers sequences into byte sequences
    > >> which then do not have Unicode character interpretation.
    > >
    > > There is only one UTF-8, the one defined by Unicode and ISO/IEC
    > > 10646, which maps valid Unicode/10646 scalar values to sequences
    > of
    > > bytes. Anything else is not UTF-8. Keep repeating this to
    > yourself.
    >
    > I was just reading the successor sequence of RFCs:
    > http://tools.ietf.org/html/rfc2044
    > http://tools.ietf.org/html/rfc2279
    > http://tools.ietf.org/html/rfc3629
    >
    > The last one restricts UTF-8 to the Unicode range, the limitations
    > of
    > UTF-16, but the others do not.
    >
    > Hans
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Jun 01 2009 - 09:57:23 CDT