Re: HTML5 encodings

From: David Starner (prosfilaes@gmail.com)
Date: Sun Dec 27 2009 - 19:28:51 CST

  • Next message: Asmus Freytag: "Re: HTML5 encodings"

    On Sun, Dec 27, 2009 at 7:10 PM, Asmus Freytag <asmusf@ix.netcom.com> wrote:
    > The first speaks directly to the topic of resynchronization. In the legacy
    > DBCS encodings, certain byte values satisfied the conditions to be either a
    > leading byte or a trailing byte. The encoding scheme imposes no limit on the
    > length of runs of such bytes, making resynchronization, in the worst case,
    > the same as re-reading the data stream from the start. Compare that to the
    > UTFs where the worst case requires examining 4 bytes to resynchronize.

    How do you resynchronize UTF-16? An byte-wise arbitrary seek into ...
    43 42 43 42 43 42 43 ... could give 䍂 repeatedly or 䉃 repeatedly.

    -- 
    Kie ekzistas vivo, ekzistas espero.
    


    This archive was generated by hypermail 2.1.5 : Sun Dec 27 2009 - 19:33:07 CST