Re: Invalid code points (was: Re: unicode Digest V10 #106)

From: Doug Ewell (doug@ewellic.org)
Date: Mon Jun 01 2009 - 22:38:26 CDT

  • Next message: Doug Ewell: "Re: Invalid code points"

    Andrew Lipscomb <ewwa at chattanooga dot net> wrote:

    >> In particular, it would be great to know if the range U+0080, ?,
    >> U+009F is invalid.
    >
    > Those code points (encoded properly) are valid. However, their
    > appearance may indicate that an error occurred in processing, as the
    > C1 controls would be rare in real Unicode text (and, with the
    > exception of U+0085, are discouraged in XML). They most often arise by
    > treating Windows-1252 as if it were ISO-Latin-1.
    >
    > In other words, not invalid, but suspicious.

    But once again, this is a question of the accuracy or fidelity of the
    input data, before it was converted to UTF-8. It has nothing to do with
    the validity of the Unicode characters from U+0080 to U+009F, nor of
    their UTF-8 representations.

    --
    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
    http://www.ewellic.org
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ
    


    This archive was generated by hypermail 2.1.5 : Mon Jun 01 2009 - 22:42:25 CDT