Doug Ewell, Sun, 6 Jan 2013 20:57:58 -0700:
> We are pretty much going round and round on this. The bottom line for
> me is, it would be nice if there were a shorthand way of saying
> "big-endian UTF-16," and many people (including you?) feel that
> "UTF-16BE" is that way, but it is not. That term has a DIFFERENT
> MEANING. The following stream:
>
> FE FF 00 48 00 65 00 6C 00 6C 00 6F
>
> is valid big-endian UTF-16, but it is NOT valid "UTF-16BE" unless the
> leading U+FEFF is explicitly meant as a zero-width no-break space,
> which may not be stripped.
I don't remember if the RFC defines one of the 3 MIME charsets as the
default, but given that "UTF-16" is supposed to be used whenever one
doesn't know the endianness, then it seems logical to assume that the
above example defaults to be treated as "UTF-16". But apart from that,
then we can also say that the example also not valid "UTF-16", unless
the U+FEFF is meant as a BOM …
I see the 3 as 3 MIME charsets.
It does anyhow seem like a definition question.
-- leif h silliReceived on Mon Jan 07 2013 - 04:11:12 CST
This archive was generated by hypermail 2.2.0 : Mon Jan 07 2013 - 04:11:13 CST