Re: Compression through normalization

From: Peter Kirk (peterkirk@qaya.org)
Date: Wed Nov 26 2003 - 07:09:57 EST

  • Next message: Peter Kirk: "Re: Definitions"

    On 25/11/2003 16:38, Doug Ewell wrote:

    >Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
    >
    >
    >
    >>So SCSU and BOCU-* formats are NOT general purpose compressors. As
    >>they are defined only in terms of stream of Unicode code points, they
    >>are assumed to follow the conformance clauses of Unicode. As they
    >>recognize their input as Unicode text, they can recognize canonical
    >>equivalence, and thus this creates an opportunity for them to consider
    >>if a (de)normalization or de/re-composition would result in higher
    >>compression (interestingly, the composition exclusion could be
    >>reconsidered in the case of BOCU-1 and SCSU compressed streams,
    >>provided that the decompression to code points will redecompose the
    >>excluded compositions).
    >>
    >>
    >
    >I have to say, if there's a flaw in Philippe's logic here, I don't see
    >it. Anyone?
    >
    >-Doug Ewell
    > Fullerton, California
    > http://users.adelphia.net/~dewell/
    >
    >
    >
    Yes, the compressor can make any canonically equivalent change, not just
    composing composition exclusions but reordering combining marks in
    different classes. The only flaw I see is that the compressor does not
    have to undo these changes on decompression; at least no other process
    is allowed to rely on it having done so.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Wed Nov 26 2003 - 08:01:13 EST