RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Dec 12 2003 - 07:29:11 EST

  • Next message: Michael Everson: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"

    Peter Kirk wrote:
    > >... Now how will you implement indexing with these
    > >private private PUAs which change of semantics across documents?
    > What is the
    > >relevant scope for these PUAs?
    > >
    > >
    > The scope would be one instance of a document opened in an application.
    > As for implementation details, that is for implementers to sort out.
    > This was a tentative suggestion which I made in passing, not something
    > which I had thought through in detail.

    But what you suggest here is exactly what a standard file compressor does.

    It does not solve any problem in the representation of characters, the
    compression scheme remains private, and can only be interpreted as text by
    redecomposing these PUAs (in their scope) to the appropriate complex DGCs.
    In addition, you need to find a way to store these associations between PUAs
    and DGCs, so the complexity is even worse.

    You would probably use it only if there are multiple occurences of these
    complex DGCs, just to save some space (this is what is performed in the
    Hangul Johab syllables as they occur very frequently when writing modern
    Korean, and the space benefit comes from the fact that it does not need to
    encode the associations between syllables and DGCs of jamos, as this is
    defined by their canonical equivalences and implemented with a very basic
    algorithm).

    So unless you can create such simple algorithm to map complex DGC with PUA
    ranges, there's little use of what you propose here.

    Do we need it?

    - Certainly no to encode text, a Lempel-Ziv-Welsh-like compression (gzip,
    deflate, bzip2...) will be more generic and will perform better and in a
    more interoperable way, as these compression formats are already well
    documented and widely implemented, and do not need a special agreement to
    limit the scope of PUAs.

    - But may be yes if the intent is to compute glyph IDs to map complex DGCs
    within font tables. The scope of these PUAs being the font itself, mapping
    glyph IDs from public code points and PUAs is effectively a good solution.

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Fri Dec 12 2003 - 08:09:26 EST