Re: Compression through normalization

From: Doug Ewell (dewell@adelphia.net)
Date: Fri Dec 05 2003 - 03:34:59 EST

  • Next message: Doug Ewell: "Re: Sort Order"

    Peter Kirk <peterkirk at qaya dot org> wrote:

    > Surely ignoring Composition Exclusions is not unilaterally extending
    > C10. The excluded precomposed characters are still canonically
    > equivalent to the decomposed (and normalised) forms. And so composing
    > a text with them, for compression or any other purpose, still conforms
    > to C10, which explicitly allows "replacement of character sequences by
    > their canonical-equivalent sequences" - not only when the resulting
    > sequence is NFC or NFD.

    Ignoring the composition exclusions does still respect canonical
    equivalence, but does not preserve a canonical normalization form (using
    the language of UAX #15). So although it is not a violation of C10, it
    does seem to run afoul of Mark's recommendation:

    "In practice, if a compressor does not produce codepoint-identical text,
    it should produce NFC
    (not just any canonically equivalent text), and should document that it
    does so."

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Fri Dec 05 2003 - 04:23:31 EST