Re: Invalid code points

From: Ruszlán Gaszanov (ruszlan@ather.net)
Date: Sun May 31 2009 - 17:59:22 CDT

  • Next message: Doug Ewell: "Re: Invalid code points"

    Hans Aberg wrote:
    > On 31 May 2009, at 19:42, Doug Ewell wrote:
    >
    >>> In particular, it would be great to know if the range U+0080, …,
    >>> U+009F is invalid.
    >>
    >> That bit is especially wrong. I can at least imagine why there might
    >> be confusion about the noncharacters and surrogate code points, but
    >> not the C1 controls.
    >
    > It is a bit disappointing: I was looking for a beginning (escape) byte
    > sequence to tell that string isn't UTF-8, among other valid strings.
    > But perhaps it does not matter.
    >
    > Hans
    Well, even though C1 control codes are technically valid Unicode
    characters, in practice, Unicode or ISO-8859-x streams containing those
    code points are extremely rare come by. For most practical purposes, the
    presence of those bytes in a text stream would likely suggest Windows
    12xx codepage.

    Ruszlán



    This archive was generated by hypermail 2.1.5 : Sun May 31 2009 - 18:01:44 CDT