Richard Wordingham <richard dot wordingham at ntlworld dot com> wrote:
>> Try by yourself, you can perfectly send JSON text containing '\uFFFF'
>> (non-character) or '\uF800' (unpaired surrogate) and I've not seen
>> any JSON implementation complaining about one or the other, when
>> receiving the JSON stream and using it in Javascript, you'll see no
>> missing code unit or replaced code units and no exception as well.
>
> Unicode Consortium standards and recommendations allow non-characters
> to be sent; as far as I can make out, they are just not to be thought
> of as unstandardised graphic characters.
As I understand it, from a purely Unicode standpoint, there are
differences here between noncharacters and unpaired surrogates.
Noncharacters are Unicode scalar values, while unpaired surrogates are
not. This means noncharacters may appear in a well-formed UTF-8, -16, or
-32 string, while unpaired surrogates may not. They may both be part of
a "Unicode string" which does not claim to be in any given encoding
form.
Authoritative corrections are welcome to help solidify my understanding.
I don't wish to get involved in debates over JSON. I've read RFC 7159
and I know what it says.
-- Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸Received on Fri May 08 2015 - 17:39:20 CDT
This archive was generated by hypermail 2.2.0 : Fri May 08 2015 - 17:39:20 CDT