RE: UTF-9

From: Addison Phillips [wM] (aphillips@webmethods.com)
Date: Thu Oct 30 2003 - 18:09:38 CST


Pub date = 1 April 2003.

I think that's the salient part.

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture.
It is not a feature.

> -----Original Message-----
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
> Behalf Of Philippe Verdy
> Sent: jeudi 30 octobre 2003 15:42
> To: John Cowan
> Cc: unicode@unicode.org
> Subject: Re: UTF-9
>
>
> From: "John Cowan" <jcowan@reutershealth.com>
>
> > http://panda.com/tops-20/utf9.txt
> >
> > Res ipsa loquitur.
>
> Are there still now platforms where storage bytes are not octets
> but nonets?
> i.e. 9-bit based platforms? If so this proposal makes sense, but
> as a local
> optimization for these platforms. Problems will code if you want to
> interchange this data with the large majority of hosts that can
> handle a 9th
> bit in their bytes.
>
> This means that the interchange would require to send 2 octets to
> represent
> each 9-bit byte without loosing data, or to use a complex bit pattern to
> pack sequences of height 9-bit bytes into sequences of nine 8-bit
> bytes, and
> with a way to interpret the last sequence (Such converters needed for
> interoperability do exist: look for example at the MIME Base64
> algorithm for
> example which is used to pack sequences of 8-bit bytes into serialized
> octets each with 6 significant bits).
>
> UTF-9 seems interesting in this case, but is it worth the value
> as it is not
> interchangeable directly with the most common networking
> technologies? Can't
> you accept to loose 1-bit per storage byte?
>
> What will happen then to a plain-text coded with UTF-9, and that is sent
> through FTP? Do you mean that FTP should use a Base256 converter for 9-bit
> platforms similar to Base64 for 8-bit platforms, to avoid loosing the most
> significant bits of each transfered byte? How the recipient of the file
> supposed to interpret the convereted data? Is it still plain text?
>
> So if the format is not interchangeable, this UTF-9 transform looks like a
> local-only transformation, and locally, each host can use its own
> representation. And why not then a UTF-18 encoding scheme that would avoid
> using UF-16 surrogates for all characters that fit in the first 4 planes?
>
> For me, a UTF-18 encoding would make better sense if local optimization in
> memory is the issue, as it will represent almost all existing Unicode
> characters in planes 0 (BMP), 1 (SMP), 2 (SIP) and 3 (still not used, but
> you may map instead the SSP plane 14 for tags and variation selectors, or
> keep it for later use as SIP2) in one 18-bit code unit... But you'll still
> need a converter to transform it to UTF-8 or a UTF-16 encoding scheme to
> perform any I/O.
>



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST