From: David Starner (prosfilaes@gmail.com)
Date: Sun Dec 27 2009 - 19:28:51 CST
On Sun, Dec 27, 2009 at 7:10 PM, Asmus Freytag <asmusf@ix.netcom.com> wrote:
> The first speaks directly to the topic of resynchronization. In the legacy
> DBCS encodings, certain byte values satisfied the conditions to be either a
> leading byte or a trailing byte. The encoding scheme imposes no limit on the
> length of runs of such bytes, making resynchronization, in the worst case,
> the same as re-reading the data stream from the start. Compare that to the
> UTFs where the worst case requires examining 4 bytes to resynchronize.
How do you resynchronize UTF-16? An byte-wise arbitrary seek into ...
43 42 43 42 43 42 43 ... could give 䍂 repeatedly or 䉃 repeatedly.
-- Kie ekzistas vivo, ekzistas espero.
This archive was generated by hypermail 2.1.5 : Sun Dec 27 2009 - 19:33:07 CST