Re: UTF-8 validation rules

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Sep 10 2001 - 15:48:06 EDT

Previous message: Carl W. Brown: "RE: UTF-8 validation rules"
Maybe in reply to: Carl W. Brown: "UTF-8 validation rules"
Next in thread: Carl W. Brown: "RE: UTF-8 validation rules"
Reply: Carl W. Brown: "RE: UTF-8 validation rules"
Reply: Carl W. Brown: "RE: UTF-8 validation rules"
Reply: David Hopwood: "Re: UTF-8 validation rules"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Carl,

>
> \xEF\xBF\xBE and \xEF\xBF\xBF are invalid Unicode characters.

In current parlance (see Unicode 3.1, UAX #27), these are
"noncharacters", and you must account for the fact that
U+1FFFE..U+1FFFF
U+2FFFE..U+2FFFF
...
U+10FFFE..U+10FFFF

all have the same status as noncharacters.

With Unicode 3.2 (in the works), the 32 additional code points
at U+FDD0..U+FDEF go from unallocated status to noncharacters
as well.

UTF-8 (and UTF-16 and UTF-32) convertors must allow the conversion
of noncharacter code points, but may then allow the detection of
their noncharacter status. Noncharacters should not appear in
open interchange of Unicode textual data, but can have internal
usage unspecified by the standard.

Detection of the status of a code point as a noncharacter
(allocated, but unassigned to a character) or as a regular unassigned code
point (not allocated) is conceptually distinct from the
validation of the UTF-8 conversion per se.

--Ken

Previous message: Carl W. Brown: "RE: UTF-8 validation rules"
Maybe in reply to: Carl W. Brown: "UTF-8 validation rules"
Next in thread: Carl W. Brown: "RE: UTF-8 validation rules"
Reply: Carl W. Brown: "RE: UTF-8 validation rules"
Reply: Carl W. Brown: "RE: UTF-8 validation rules"
Reply: David Hopwood: "Re: UTF-8 validation rules"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Mon Sep 10 2001 - 16:42:45 EDT