From: Doug Ewell (doug@ewellic.org)
Date: Mon Jun 01 2009 - 22:38:26 CDT
Andrew Lipscomb <ewwa at chattanooga dot net> wrote:
>> In particular, it would be great to know if the range U+0080, ?,
>> U+009F is invalid.
>
> Those code points (encoded properly) are valid. However, their
> appearance may indicate that an error occurred in processing, as the
> C1 controls would be rare in real Unicode text (and, with the
> exception of U+0085, are discouraged in XML). They most often arise by
> treating Windows-1252 as if it were ISO-Latin-1.
>
> In other words, not invalid, but suspicious.
But once again, this is a question of the accuracy or fidelity of the
input data, before it was converted to UTF-8. It has nothing to do with
the validity of the Unicode characters from U+0080 to U+009F, nor of
their UTF-8 representations.
-- Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Mon Jun 01 2009 - 22:42:25 CDT