From: Doug Ewell (doug@ewellic.org)
Date: Sun Dec 27 2009 - 22:09:59 CST
Asmus Freytag <asmusf at ix dot netcom dot com> wrote:
> The second metric refers to encodings like ISO-2022 or SCSU which use
> control bytes or sequences switch among character sets. There are
> cases, where such as scheme could be set up to allow easy
> resynchronization in terms of character boundaries, yet still require
> that state information be maintained for very long (unbounded)
> stretches of data. Assume 2022 style combination of several single
> byte character sets. If that restriction is known (by announcement),
> then resynchronizing to any character boundary is trivial (as long as
> you recognize and avoid the escape codes). However, interpreting (or
> correctly converting) any given character is impossible without going
> back to the most recent character set switching escape code.
BOCU-1 has a handy "reset" mechanism, in which the byte 0xFF doesn't
participate in the encoding of any character, but simply resets the
state of the encoder or decoder. If desired, these could be inserted at
certain intervals within a stream to ensure the availability of a
synchronization point, solving the problem above.
However, such a mechanism naturally means a code point sequence could be
encoded in BOCU-1 in more than one way, and it could interfere with the
seemingly all-important binary-ordering property of BOCU-1, so the
authors apparently felt compelled to invoke the Principle of
Pre-Deprecation:
"Using FF to reset the state breaks the ordering! The use of FF resets
is discouraged."
The reset mechanism doesn't seem to be mentioned in the BOCU patent.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s
This archive was generated by hypermail 2.1.5 : Sun Dec 27 2009 - 22:12:11 CST