Re: Collation (was RE: [OT] o-circumflex)

From: David Gallardo (dgallardo@mediaone.net)
Date: Thu Sep 13 2001 - 11:35:00 EDT


Java's collation class has a rule-based collator that is in effect
programmable using a little language. Here is how an example from Sun's API
doc for Norwegian:

String Norwegian = "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J"
                 "< k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T"
                 "< u,U< v,V< w,W< x,X< y,Y< z,Z"
                 "< å=a?,Å=A?"
                 ";aa,AA< æ,Æ< ø,Ø";
 RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);

There is also syntax for things such as specifying reverse order (for French
accents for example), contraction and expansion.

- David Gallardo

----- Original Message -----
From: "Edward Cherlin" <Edward.Cherlin.SY.67@aya.yale.edu>
To: <unicode@unicode.org>
Sent: Thursday, September 13, 2001 3:40 AM
Subject: Collation (was RE: [OT] o-circumflex)

> English and several other languages have dozens of collations. Compare
telephone books, library catalogs, book indexes (sic), and other sorted
data. Knuth vol. 3 Sorting and Searching gives an example of a set of
library sorting rules that runs to more than a page, and suggests
programming it as an exercise. ;-) Among the rules are to spell out numbers.
> For example,
>
> 1984 (Nineteen Eighty Four)
> 1066 and all that (Ten Sixty Six)
> 3001 (Three Thousand One)
> 2050 (Twenty Fifty)
> 2010 (Twenty Ten)
> 2001, A Space Odyssey (Two Thousand One)
>
> Bell Labs invented a whole programming language, Snobol, to deal with
telephone listing conversions, matches, and sorts. Many phone books sort Mc-
and Mac- together, others one after the other but separate from other names.
>
> Edward Cherlin
> Generalist
> "A knot! Oh, do let me help to undo it."
> Alice in Wonderland
>
>



This archive was generated by hypermail 2.1.2 : Thu Sep 13 2001 - 11:34:23 EDT