From: Arcane Jill (arcanejill@ramonsky.com)
Date: Tue Nov 25 2003 - 05:32:26 EST
I'm pretty sure it depends on whether you regard a text document as a
sequence of characters, or as a sequence of glyphs. (Er - I mean
"default grapheme clusters" of course). Regarded as a sequence of
characters, normalisation changes that sequence. But regarded as a
sequence of glyphs, normalisation leaves the sequence unchanged. So a
compression algorithm could legitimately claim to be "lossless" if it
did normalisation but operated at the glyph level.
I'm pretty sure you DON'T need to preserve the byte-stream bit for bit.
For example, at the byte level, I see no reason to preserve invalid
encoding sequences, and at the codepoint level I see no reason to
preserve non-character codepoints. So - at the glyph level - we only
need to preserve glyphs, no? It all depends on how the compression
algorithm describes itself.
I think this might go wrong for "tailored grapheme clusters", but I
don't know much about them.
Jill
This archive was generated by hypermail 2.1.5 : Tue Nov 25 2003 - 06:13:54 EST