From: Ruszlán Gaszanov (ruszlan@ather.net)
Date: Sun May 31 2009 - 17:59:22 CDT
Hans Aberg wrote:
> On 31 May 2009, at 19:42, Doug Ewell wrote:
>
>>> In particular, it would be great to know if the range U+0080, …,
>>> U+009F is invalid.
>>
>> That bit is especially wrong. I can at least imagine why there might
>> be confusion about the noncharacters and surrogate code points, but
>> not the C1 controls.
>
> It is a bit disappointing: I was looking for a beginning (escape) byte
> sequence to tell that string isn't UTF-8, among other valid strings.
> But perhaps it does not matter.
>
> Hans
Well, even though C1 control codes are technically valid Unicode
characters, in practice, Unicode or ISO-8859-x streams containing those
code points are extremely rare come by. For most practical purposes, the
presence of those bytes in a text stream would likely suggest Windows
12xx codepage.
Ruszlán
This archive was generated by hypermail 2.1.5 : Sun May 31 2009 - 18:01:44 CDT