Re: UTF-2 to UTF-8 conversion

From: gwm@austin.ibm.com
Date: Thu Dec 11 1997 - 14:53:58 EST


Sandra,

As usual, you are correct. FSS-UTF was called UTF-2 for a short period --
the second transformation format after UTF-1.

The difference between FSS-UTF (UTF-2) and UTF-8 is that FSS-UTF
performs a simple transformation (bit shift) to/from UCS-2/4 and
UTF-8 takes UTF-16 into account -- FSS-UTF was prior to UTF-16.

Other than that minor difference FSS-UTF (UTF-2) and UTF-8 are
identical.

-------------------------------------------------------------------------
   Gary W. Miller Internet - gwm@austin.ibm.com
   IBM JTMS/903 ZIP 9374 X/Open - g.miller@xopen.co.uk
   11400 Burnet Road VNET - AUSTIN(GWM) / GWM at AUSTIN
   Austin, Texas 78758 SENDFILE - GWM at AUSVM6
   Phone: (512) 838-8297 Fax: (512) 838-0169
-------------------------------------------------------------------------

>From: odonnell@zk3.dec.com
>Date: Thu, 11 Dec 1997 09:08:25 -0800 (PST)
>
> > Does anyone out there know the difference between UTF-2
> > (Unicode std 1.0 I believe) and UTF-8? If you are aware of a
> > program that converts files in UTF-8 to UTF-2 and vice versa,
> > please let me know,
>
> The Unicode Standard, Version 1.0, is represented in the
> form known as UCS-2.
>
> Published in the back of of The Unicode Standard, Version 1.1,
> (Unicode Technical Report #4) was a transformation format
> identified as FSS-UTF. That is what we now know as UTF-8.
>
> UTF-1 was a transformation format published in the first
> edition of ISO/IEC 10646, since supplanted by amendments which
> have defined UTF-8 and UTF-16.
>
> The Unicode Standard, Version 2.0, is represented in the form
> known as UTF-16--which for all characters encoded so far is
> identical to UCS-2.
>
> UTF-2 is an erroneous term.
> . . .
>
>Maybe not. UTF-2 was the original name for UTF-8. As you
>note, Ken, UTF-1 was a transformation format in the first
>edition of ISO/IEC 10646. It quickly became clear that UTF-1
>was not very useful, so an alternative was developed. I think
>some of the people working on Plan 9 developed it, along with
>Gary Miller of IBM and others. (But I'm getting old, so my
>memory may be faulty.) Anyway, the alternative was called
>UTF-2.
>
>That name changed pretty quickly to FSS-UTF (File System
>Safe UCS Transformation Format). The second name was a mouthful,
>however, so it changed AGAIN to UTF-8.
>
>Because few people remember UTF-2, it is possible that your
>answer here is correct, and the original requestor (I don't
>remember who asked this) meant to type UCS-2 but typed
>UTF-2 instead. But if he really did mean UTF-2, the
>question about converters between UTF-2 and UTF-8 becomes
>relatively simple. These are supposed to be the same thing.
>However, I believe some small changes have slipped into UTF-8
>over time. In that case, these would be almost, but not quite,
>identical.
>
> Historically yours,
> -- Sandra
>-----------------------
>Sandra Martin O'Donnell
>odonnell@zk3.dec.com
>
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT