From: Doug Ewell (doug@ewellic.org)
Date: Sun May 31 2009 - 19:23:39 CDT
Ruszlán Gaszanov <ruszlan at ather dot net> wrote:
> Well, even though C1 control codes are technically valid Unicode
> characters, in practice, Unicode or ISO-8859-x streams containing
> those code points are extremely rare come by. For most practical
> purposes, the presence of those bytes in a text stream would likely
> suggest Windows 12xx codepage.
Absolutely correct. If you see 0x80 in an "ISO 8859-1" text stream,
it's very likely that the stream should have been interpreted as
Windows-1252 instead.
But if you see {0xC2, 0x80} in a UTF-8 text stream, it's a perfectly
valid encoding of U+0080, regardless of whether U+0080 was the right
code point to begin with.
-- Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Sun May 31 2009 - 19:26:58 CDT