Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Doug Ewell (dewell@adelphia.net)
Date: Fri Dec 12 2003 - 01:13:13 EST

Next message: jameskass@att.net: "RE: character map in Microsoft Word"

Previous message: Murray Sargent: "RE: character map in Microsoft Word"
In reply to: Kenneth Whistler: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: Kenneth Whistler: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Kenneth Whistler <kenw at sybase dot com> wrote:

> It is perfectly conformant with the Unicode Standard to assert
> that <U+00E9> "é" and <U+0065, U+0301> "é" are different
> Unicode strings. They *are* different Unicode strings. They
> contain different encoded characters, and they have different
> lengths.
> ...
> What canonical equivalence is about is making non-distinctions
> in the *interpretation* of equivalent sequences. No Unicode-
> conformant process should assume that another process will
> systematically distinguish a meaningful interpretation
> difference between <U+00E9> "é" and <U+0065, U+0301> "é" --
> they both represent the *same* abstract character, namely
> an e-acute.

Just to wrap up the discussion we had last week on compression:

For me at least, this settles it. Compression engines generally operate
at a level where strings of encoded characters, not their
interpretation, are at issue. Differences between strings that are due
to normalization are not relevant for interpretation, but may be very
relevant for other factors, like string length and checksums.

That being the case, it would *not* generally be appropriate for a
compressor to normalize its input text. To do so would be to introduce
differences at a level where there should be none.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: jameskass@att.net: "RE: character map in Microsoft Word"
Previous message: Murray Sargent: "RE: character map in Microsoft Word"
In reply to: Kenneth Whistler: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: Kenneth Whistler: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Dec 12 2003 - 01:59:28 EST