Re: Last Call: UTF-16, an encoding of ISO 10646 to Informational

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Sun Aug 15 1999 - 18:40:06 EDT


=?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= wrote on 1999-08-15 15:55 UTC:
> I.e. from my point of view, Paul tries to register three different names
> which can be used in MIME:
>
> UTF-16
> UTF-16LE
> UTF-16BE
>
> I need to, as area director, to know wether it is wrong or right to do this
> registration.

Very clear answer:

It is WRONG to register both a bigendian and a littleendian variant
of UTF-16.

Reasons:

  a) it has been long-established practice to use *exclusively* bigendian
     convention in ISO, ITU, IETF, ECMA, and Internet RFC protocols
  b) there is no technical need for a littleendian format or for two
     alternative UTF-16 formats on the wire
  c) it has been proven that bigendian/littleendian conversion has no
     measurable impact on performance whatsoever
  d) the littleendian convention is an embarrassing historic accident
     that affects only a small number of CPU brands (unfortunately also
     the one I use myself) and bigendian is commonly accepted to be
     the natural and technically sound encoding of multi-byte
     integers today (full story available on request)

It is probably acceptable to register just "UTF-16" and make it clear
that this refers in the MIME context always to the bigendian encoding. I
welcome that UTF-8 is clearly identified as the preferred encoding.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT