Re: Why is "endianness" relevant when storing data on disks but not when in memory? from Leif Halvard Silli on 2013-01-07 (Unicode Mail List Archive)

From: Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>
Date: Mon, 07 Jan 2013 11:08:36 +0100

Doug Ewell, Sun, 6 Jan 2013 20:57:58 -0700:
> We are pretty much going round and round on this. The bottom line for
> me is, it would be nice if there were a shorthand way of saying
> "big-endian UTF-16," and many people (including you?) feel that
> "UTF-16BE" is that way, but it is not. That term has a DIFFERENT
> MEANING. The following stream:
>
> FE FF 00 48 00 65 00 6C 00 6C 00 6F
>
> is valid big-endian UTF-16, but it is NOT valid "UTF-16BE" unless the
> leading U+FEFF is explicitly meant as a zero-width no-break space,
> which may not be stripped.

I don't remember if the RFC defines one of the 3 MIME charsets as the
default, but given that "UTF-16" is supposed to be used whenever one
doesn't know the endianness, then it seems logical to assume that the
above example defaults to be treated as "UTF-16". But apart from that,
then we can also say that the example also not valid "UTF-16", unless
the U+FEFF is meant as a BOM …

I see the 3 as 3 MIME charsets.

It does anyhow seem like a definition question.

-- 
leif h silli

Received on Mon Jan 07 2013 - 04:11:12 CST

This archive was generated by hypermail 2.2.0 : Mon Jan 07 2013 - 04:11:13 CST