From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Dec 28 2009 - 00:35:57 CST
On 12/27/2009 8:09 PM, Doug Ewell wrote:
> Asmus Freytag <asmusf at ix dot netcom dot com> wrote:
>
>> The second metric refers to encodings like ISO-2022 or SCSU which use
>> control bytes or sequences switch among character sets. There are
>> cases, where such as scheme could be set up to allow easy
>> resynchronization in terms of character boundaries, yet still require
>> that state information be maintained for very long (unbounded)
>> stretches of data. Assume 2022 style combination of several single
>> byte character sets. If that restriction is known (by announcement),
>> then resynchronizing to any character boundary is trivial (as long as
>> you recognize and avoid the escape codes). However, interpreting (or
>> correctly converting) any given character is impossible without going
>> back to the most recent character set switching escape code.
>
> BOCU-1 has a handy "reset" mechanism, in which the byte 0xFF doesn't
> participate in the encoding of any character, but simply resets the
> state of the encoder or decoder. If desired, these could be inserted
> at certain intervals within a stream to ensure the availability of a
> synchronization point, solving the problem above.
>
> However, such a mechanism naturally means a code point sequence could
> be encoded in BOCU-1 in more than one way, and it could interfere with
> the seemingly all-important binary-ordering property of BOCU-1, so the
> authors apparently felt compelled to invoke the Principle of
> Pre-Deprecation:
>
> "Using FF to reset the state breaks the ordering! The use of FF resets
> is discouraged."
>
> The reset mechanism doesn't seem to be mentioned in the BOCU patent.
Also, a reset that isn't enforced by protocol, but merely allowed,
doesn't improve the theoretical worst case. (While suffering from all
the problems you mentioned).
A./
This archive was generated by hypermail 2.1.5 : Mon Dec 28 2009 - 00:39:24 CST