Re: Invalid code points

From: Ruszlán Gaszanov (ruszlan@ather.net)
Date: Sun May 31 2009 - 17:59:22 CDT

Next message: Doug Ewell: "Re: Invalid code points"

Previous message: Doug Ewell: "Re: Invalid code points"
In reply to: Hans Aberg: "Re: Invalid code points"
Next in thread: Doug Ewell: "Re: Invalid code points"
Reply: Doug Ewell: "Re: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hans Aberg wrote:
> On 31 May 2009, at 19:42, Doug Ewell wrote:
>
>>> In particular, it would be great to know if the range U+0080, …,
>>> U+009F is invalid.
>>
>> That bit is especially wrong. I can at least imagine why there might
>> be confusion about the noncharacters and surrogate code points, but
>> not the C1 controls.
>
> It is a bit disappointing: I was looking for a beginning (escape) byte
> sequence to tell that string isn't UTF-8, among other valid strings.
> But perhaps it does not matter.
>
> Hans
Well, even though C1 control codes are technically valid Unicode
characters, in practice, Unicode or ISO-8859-x streams containing those
code points are extremely rare come by. For most practical purposes, the
presence of those bytes in a text stream would likely suggest Windows
12xx codepage.

Ruszlán

Next message: Doug Ewell: "Re: Invalid code points"
Previous message: Doug Ewell: "Re: Invalid code points"
In reply to: Hans Aberg: "Re: Invalid code points"
Next in thread: Doug Ewell: "Re: Invalid code points"
Reply: Doug Ewell: "Re: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun May 31 2009 - 18:01:44 CDT