From: Kenneth Whistler (kenw@sybase.com)
Date: Wed May 19 2004 - 20:19:44 CDT
/|/|ike (or |\|\ike) responded to Philippe:
> > However I feel it's not legal (or really not recommanded) to encode non-
> > character codepoints xFFFE-xFFFF where x is any plane number. So the rules
> > need to be a bit more detailed to exclude them.
>
> Why do we need special rules to not encode characters that don't
> exist?
Please, everybody, before we start another pointless thread,
examine the actual definition of UTF-8 and the rationale
for an encoding scheme.
UTF-8 must be able to represent every Unicode scalar value --
and that *includes* noncharacter code points.
D28 Unicode scalar value: Any Unicode code point except high-surrogate
and low-surrogate code points.
D29 A Unicode encoding form assigns each Unicode scalar value to a
unique code unit sequence.
Before you all start shooting from the hip about UTF-8 on the
list, please read (and understand) the normative definitions of
these things in the standard.
--Ken
P.S. Whoever (and whatever) is starting to prepend "[BULK]" to
thread topics, would you cease and desist? ;-)
This archive was generated by hypermail 2.1.5 : Wed May 19 2004 - 20:20:58 CDT