From: Mark Davis (mark.edward.davis@gmail.com)
Date: Sun May 31 2009 - 12:13:48 CDT
That section is incorrect. The relevant passages are
D76 Unicode scalar value: Any Unicode code point except high-surrogate and
low-surro-
gate code points.
• As a result of this definition, the set of Unicode scalar values consists
of the
ranges 0 to D7FF16 and E00016 to 10FFFF16, inclusive.
and under D79, p100:
To ensure that the mapping for a Unicode encoding form is one-to-one, all
Unicode scalar values, including those corresponding to noncharacter code
points and unassigned code points, must be mapped to unique code unit
sequences. Note that this requirement does not extend to high-surrogate and
low-surrogate code points, which are excluded by definition from the set of
Unicode scalar values.
Mark
On Sun, May 31, 2009 at 08:55, Hans Aberg <haberg@math.su.se> wrote:
> This quote say that it depends on how you read the standard which code
> points are invalid; perhaps someone here can clarify :-):
> http://en.wikipedia.org/wiki/UTF-8#Invalid_code_points
>
> In particular, it would be great to know if the range U+0080, …, U+009F is
> invalid.
>
> Hans Aberg
>
>
>
>
>
This archive was generated by hypermail 2.1.5 : Sun May 31 2009 - 12:17:12 CDT