Re: UTF-7 signature

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Apr 11 2002 - 23:08:18 EDT


Markus Scherer <markus.scherer@jtcsv.com> wrote:

> On 2002-apr-09, Shlomi Tal and Doug Ewell discussed on this list
> a UTF-7 signature byte sequence of +/v8- (which was news to me).

I don't remember ever reading a recommendation, or even a suggestion, to
use +/v8- as a signature for UTF-7. But that would be the way to encode
a standalone U+FEFF.

> This illustrates a property of UTF-7 that sets it further apart
> from most encodings than for example SCSU and BOCU-1:
> In most Character Encoding Schemes, consecutive code units/points
> are encoded in _separate_, consecutive byte sequences.
>
> In UTF-7, byte sequences overlap and many bytes in the encoding
> (2 out of 8 I think) contain pieces of two adjacent code units.
> This is more like in Huffman codes.

This is one reason why I'm a little uncomfortable with the wording in
UTR #17, which specifically mentions SCSU as a Transfer Encoding Syntax
(in contrast to a Character Encoding Scheme) but does not mention UTF-7,
which to my mind fits the definition of a TES much better. Perhaps this
is just the conscious effort to ignore UTF-7 in the hope it will go
away; I have no problem with that.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Thu Apr 11 2002 - 21:48:53 EDT