Re: Odd "Unicode" Charset

From: Steffen <sdaoden_at_gmail.com>
Date: Sat, 16 Nov 2013 18:32:45 +0100

Tom Gewecke <Tom_at_bluesky.org> wrote:
 |http://tools.ietf.org/html/rfc1641
 |
 |which I think indicates that utf-16 is the correct interpretation. \

I read this as UTF-16BE:

  This character set is encoded as sequences of octets, two per
  16-bit character, with the most significant octet first. Text
  with an odd number of octets is ill-formed.

  Rationale. ISO/IEC 10646-1:1993(E) specifies that when
  characters in the UCS-2 form are serialized as octets, that the
  most significant octet appear first.

 |Does anyone know whether charset="unicode" is at all normal these days?

If you ask me -- at the minimum over the wire this is and ever was
a terroristic charset. Just my one cent.

--:)

attached mail follows:


Recently when troubleshooting an email problem for a Mac user, I came across an email with Content-Type charset="unicode". I had not seen this before. OS X Mail was reading it as Chinese text instead of Latin.
I did find something like this on the IANA list and understand there is an RFC from 1994 that provides info about it:

http://tools.ietf.org/html/rfc1641

which I think indicates that utf-16 is the correct interpretation. However Mail seems to get the bytes backwards, so 0061 a gets read as 6100 愀.

Does anyone know whether charset="unicode" is at all normal these days?
Received on Sat Nov 16 2013 - 11:34:44 CST

This archive was generated by hypermail 2.2.0 : Sat Nov 16 2013 - 11:34:46 CST