RE: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Dec 12 2003 - 07:29:11 EST

Next message: Michael Everson: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"

Previous message: Arcane Jill: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"
In reply to: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter Kirk wrote:
> >... Now how will you implement indexing with these
> >private private PUAs which change of semantics across documents?
> What is the
> >relevant scope for these PUAs?
> >
> >
> The scope would be one instance of a document opened in an application.
> As for implementation details, that is for implementers to sort out.
> This was a tentative suggestion which I made in passing, not something
> which I had thought through in detail.

But what you suggest here is exactly what a standard file compressor does.

It does not solve any problem in the representation of characters, the
compression scheme remains private, and can only be interpreted as text by
redecomposing these PUAs (in their scope) to the appropriate complex DGCs.
In addition, you need to find a way to store these associations between PUAs
and DGCs, so the complexity is even worse.

You would probably use it only if there are multiple occurences of these
complex DGCs, just to save some space (this is what is performed in the
Hangul Johab syllables as they occur very frequently when writing modern
Korean, and the space benefit comes from the fact that it does not need to
encode the associations between syllables and DGCs of jamos, as this is
defined by their canonical equivalences and implemented with a very basic
algorithm).

So unless you can create such simple algorithm to map complex DGC with PUA
ranges, there's little use of what you propose here.

Do we need it?

- Certainly no to encode text, a Lempel-Ziv-Welsh-like compression (gzip,
deflate, bzip2...) will be more generic and will perform better and in a
more interoperable way, as these compression formats are already well
documented and widely implemented, and do not need a special agreement to
limit the scope of PUAs.

- But may be yes if the intent is to compute glyph IDs to map complex DGCs
within font tables. The scope of these PUAs being the font itself, mapping
glyph IDs from public code points and PUAs is effectively a good solution.

__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com

application/ms-tnef attachment: winmail.dat

Next message: Michael Everson: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Previous message: Arcane Jill: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"
In reply to: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Next in thread: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Reply: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Dec 12 2003 - 08:09:26 EST