Re: Roundtripping Solved

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Dec 15 2004 - 10:38:55 CST

  • Next message: Marcin 'Qrczak' Kowalczyk: "Re: Roundtripping Solved"

    Marcin 'Qrczak' Kowalczyk <qrczak at knm dot org dot pl> wrote:

    >> OBSERVATION - Requirement (4) is not met absolutely, however,
    >> the probability of the UTF-8 encoding of this sequence occuring
    >> "accidently" at an arbitrary offset in an arbitrary octet stream
    >> is approximately one in 2^384;
    >
    > Assuming that the distribution of sequences of characters is uniform.
    > But it's not! As soon as you start using this encoding somewhere,
    > the probability of appearing of this sequence raises dramatically.
    > If you convert UTF-8 -> UTF-32 using modified rules, and UTF-32 ->
    > UTF-8 using standard rules, then you get this sequence without waiting
    > 2^340 years.

    Well, of course. Any sequence of events, chosen for a special purpose
    on the basis that it is unlikely to occur naturally, will now occur
    "naturally" much more often(under the new definition of "naturally").

    One of the early rationales for the U+FEFF signature/BOM in UTF-16 was
    that the sequences <0xFF, 0xFE> and <0xFE, 0xFF> were both considered
    unlikely to occur in "normal" text. Of course, now they occur in lots
    of "normal" Unicode text, but they are still doing their job. The
    scheme works.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 10:46:30 CST