From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat May 08 2004 - 16:27:40 CDT
From: "E. Keown" <k_isoetc@yahoo.com>
> Thank you Philippe for taking the time to explain. I
> originally wanted to be a digital lexicographer, so I
> am interested in perfect collation.
Pas de quoi! I hope I have been useful to explain the basic concepts. In fact
the Unicode algorithm for collation is a bit more more complex, because it takes
into accounts more subtles features needed to cover various languages. My
examples were very simplified face to what you can do with Unicode collation.
> I assume that Philippe's 'DUCET' and Michael Everson's
> "default template" refer to the same item. And
> Unicode-compliant software will support DUCET.
"DUCET" is referenced in the Unicode standard documenting collation. It's a
prebuilt table of collation "weigths" (the term used to designate the comparable
numeric values that allows matching and ordering characters and strings)
computed according to what is really a standardized (but tailorable) default
collation order, and some arbitrary numeric thresholds and arbitrary "gap"
values (to simplify some implementations of tailoring, without requiring
renumbering of weights in case of insertions).
A fully Unicode-compliant collation algorithm implementing the DUCET is not
required to use the same weights, but just to keep their relative order and
composition.
The introductory message described what could be done, but the UTS document
describes things with more details.
This archive was generated by hypermail 2.1.5 : Sat May 08 2004 - 16:28:06 CDT