John> After messing with many ugly and inefficient algorithms, I would
John> like input on the following partial strategy:
John> If a base+combining sequence is EXACTLY equivalent to ONE
John> precomposed character, reduce it to the single character.
John> Otherwise, do not reduce it at all.
John> The argument for this is that reduction to compatibility characters
John> will be done for interoperation with Level 1-type systems that can't
John> cope with combining characters, not for mere compression. If
John> there's no way to weed out all the combining characters, the
John> recipient must be prepared to cope with some, and if some, why not
John> all?
John> What do the assembled Unicoders think of that idea?
This approach I took from the beginning. My attitude was that if Unicode has
the characters, why not use them as long as they don't cause problems and
their use by the application is clearly documented.
I would like to point out that in an actual editing environment, maximally
decomposed characters tend to eliminate certain kinds of bookeeping problems
that would normally result from using maximally composed characters.
-----------------------------------------------------------------------------
mleisher@crl.nmsu.edu
Mark Leisher "A designer knows he has achieved perfection
Computing Research Lab not when there is nothing left to add, but
New Mexico State University when there is nothing left to take away."
Box 30001, Dept. 3CRL -- Antoine de Saint-Exup éry
Las Cruces, NM 88003
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT