From: Doug Ewell (dewell@adelphia.net)
Date: Sat Nov 29 2003 - 21:14:10 EST
Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
> I've tried to experiment a collation algorithm to implement UCA by the
> same system as used in UCD decompositions, but with added (and
> sometimes modified) decompositions. This system creates new "code
> points" needed to represent only <font> compatibility differences,
> ligatures, or alternate forms, as a decomposition of the existing
> compatibility character, into more basic characters exposed with
> primary differences in UCA, plus these new characters given "variable"
> collation weights, which may be ignorable in applications which ignore
> extra levels. This encoding uses a 31 bit code space, which is still
> highly compressible, but still representable with the UTF-8 TES (but
> they are not containing Unicode code points) or similar ad-hoc
> representation.
Please don't use UTF-8 to encode anything other than Unicode code
points.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Sat Nov 29 2003 - 21:52:56 EST