Re: Compression through normalization

From: Peter Kirk (peterkirk@qaya.org)
Date: Thu Dec 04 2003 - 13:11:58 EST

  • Next message: Michael Everson: "Re: MS Windows and Unicode 4.0 ?"

    On 04/12/2003 08:39, Doug Ewell wrote:

    >...
    >
    >(2) I am NOT interested in inventing a new normalization form, or any
    >variants on existing forms. Any approach that involves compatibility
    >equivalences, ignores the Composition Exclusions table, or creates
    >equivalences that do not exist in the Unicode Character Database (such
    >as "U+1109 + U+1109 = U+110A") is NOT of interest. That amounts to
    >unilaterally extending C10, which may already be too liberal to be
    >applied to compression.
    >
    >
    >
    Surely ignoring Composition Exclusions is not unilaterally extending
    C10. The excluded precomposed characters are still canonically
    equivalent to the decomposed (and normalised) forms. And so composing a
    text with them, for compression or any other purpose, still conforms to
    C10, which explicitly allows "replacement of character sequences by
    their canonical-equivalent sequences" - not only when the resulting
    sequence is NFC or NFD.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Thu Dec 04 2003 - 14:10:47 EST