Re: Invalid code points

From: Doug Ewell (doug@ewellic.org)
Date: Sun May 31 2009 - 19:23:39 CDT

Next message: William J Poser: "Re: Invalid code points"

Previous message: Ruszlán Gaszanov: "Re: Invalid code points"
In reply to: Ruszlán Gaszanov: "Re: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Ruszlán Gaszanov <ruszlan at ather dot net> wrote:

> Well, even though C1 control codes are technically valid Unicode
> characters, in practice, Unicode or ISO-8859-x streams containing
> those code points are extremely rare come by. For most practical
> purposes, the presence of those bytes in a text stream would likely
> suggest Windows 12xx codepage.

Absolutely correct. If you see 0x80 in an "ISO 8859-1" text stream,
it's very likely that the stream should have been interpreted as
Windows-1252 instead.

But if you see {0xC2, 0x80} in a UTF-8 text stream, it's a perfectly
valid encoding of U+0080, regardless of whether U+0080 was the right
code point to begin with.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

Next message: William J Poser: "Re: Invalid code points"
Previous message: Ruszlán Gaszanov: "Re: Invalid code points"
In reply to: Ruszlán Gaszanov: "Re: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun May 31 2009 - 19:26:58 CDT