Tom Gewecke <Tom_at_bluesky.org> wrote:
|http://tools.ietf.org/html/rfc1641
|
|which I think indicates that utf-16 is the correct interpretation. \
I read this as UTF-16BE:
This character set is encoded as sequences of octets, two per
16-bit character, with the most significant octet first. Text
with an odd number of octets is ill-formed.
Rationale. ISO/IEC 10646-1:1993(E) specifies that when
characters in the UCS-2 form are serialized as octets, that the
most significant octet appear first.
|Does anyone know whether charset="unicode" is at all normal these days?
If you ask me -- at the minimum over the wire this is and ever was
a terroristic charset. Just my one cent.
--:)
attached mail follows:
Recently when troubleshooting an email problem for a Mac user, I came across an email with Content-Type charset="unicode". I had not seen this before. OS X Mail was reading it as Chinese text instead of Latin.
I did find something like this on the IANA list and understand there is an RFC from 1994 that provides info about it:
http://tools.ietf.org/html/rfc1641
which I think indicates that utf-16 is the correct interpretation. However Mail seems to get the bytes backwards, so 0061 a gets read as 6100 愀.
Does anyone know whether charset="unicode" is at all normal these days?
Received on Sat Nov 16 2013 - 11:34:44 CST
This archive was generated by hypermail 2.2.0 : Sat Nov 16 2013 - 11:34:46 CST