Re: Compression through normalization

From: Peter Kirk (peterkirk@qaya.org)
Date: Fri Dec 05 2003 - 05:51:39 EST

Next message: Peter Kirk: "Re: Supporting the Unicode Project"

Previous message: Michael Everson: "Re: Missing African Latin letters"
In reply to: Doug Ewell: "Re: Compression through normalization"
Next in thread: Mark Davis: "Re: Compression through normalization"
Reply: Mark Davis: "Re: Compression through normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 05/12/2003 00:34, Doug Ewell wrote:

>Peter Kirk <peterkirk at qaya dot org> wrote:
>
>
>
>>Surely ignoring Composition Exclusions is not unilaterally extending
>>C10. The excluded precomposed characters are still canonically
>>equivalent to the decomposed (and normalised) forms. And so composing
>>a text with them, for compression or any other purpose, still conforms
>>to C10, which explicitly allows "replacement of character sequences by
>>their canonical-equivalent sequences" - not only when the resulting
>>sequence is NFC or NFD.
>>
>>
>
>Ignoring the composition exclusions does still respect canonical
>equivalence, but does not preserve a canonical normalization form (using
>the language of UAX #15). So although it is not a violation of C10, it
>does seem to run afoul of Mark's recommendation:
>
>"In practice, if a compressor does not produce codepoint-identical text,
>it should produce NFC
>(not just any canonically equivalent text), and should document that it
>does so."
>
>
>
>
OK. So it's Mark, not me, who is unilaterally extending C10. Well, Ken
said much the same, so it's bilateral; and I agree it is a sensible
extension.

But, as Ken also pointed out, it is quite permissible to use any
encoding for the intermediate e.g. compressed form of the text, as long
as it is possible to recover from this the normalised form of the
original text. My suggestion of composing the text using composition
exclusions meets this test, in a way not met by some of the other
suggestions, e.g. composing Korean characters into precomposed forms
which are (sadly) not canonically equivalent.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/

Next message: Peter Kirk: "Re: Supporting the Unicode Project"
Previous message: Michael Everson: "Re: Missing African Latin letters"
In reply to: Doug Ewell: "Re: Compression through normalization"
Next in thread: Mark Davis: "Re: Compression through normalization"
Reply: Mark Davis: "Re: Compression through normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Dec 05 2003 - 06:28:17 EST