Re: Collation (was RE: [OT] o-circumflex)

From: Mark Davis (mark@macchiato.com)
Date: Thu Sep 13 2001 - 12:10:54 EDT


In the latest ICU, we took the work we did for Java collation and extended
it substantially (and made it many times faster). It also allows arbitrary
customization at runtime.

I happen to be giving a presentation on it in a few hours at the conference.
For more information, see the draft collation chapter in the User guide, at
http://oss.software.ibm.com/icu/. The presentation (a slightly older draft)
is on my site at www.macchiato.com

Mark
—————

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Όμήρου Μαργίτῃ
[http://www.macchiato.com]
----- Original Message -----
From: "David Gallardo" <dgallardo@mediaone.net>
To: "Edward Cherlin" <Edward.Cherlin.SY.67@aya.yale.edu>;
<unicode@unicode.org>
Sent: Thursday, September 13, 2001 8:35 AM
Subject: Re: Collation (was RE: [OT] o-circumflex)

> Java's collation class has a rule-based collator that is in effect
> programmable using a little language. Here is how an example from Sun's
API
> doc for Norwegian:
>
> String Norwegian = "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J"
> "< k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T"
> "< u,U< v,V< w,W< x,X< y,Y< z,Z"
> "< å=a?,Å=A?"
> ";aa,AA< æ,Æ< ø,Ø";
> RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);
>
> There is also syntax for things such as specifying reverse order (for
French
> accents for example), contraction and expansion.
>
> - David Gallardo
>
> ----- Original Message -----
> From: "Edward Cherlin" <Edward.Cherlin.SY.67@aya.yale.edu>
> To: <unicode@unicode.org>
> Sent: Thursday, September 13, 2001 3:40 AM
> Subject: Collation (was RE: [OT] o-circumflex)
>
>
> > English and several other languages have dozens of collations. Compare
> telephone books, library catalogs, book indexes (sic), and other sorted
> data. Knuth vol. 3 Sorting and Searching gives an example of a set of
> library sorting rules that runs to more than a page, and suggests
> programming it as an exercise. ;-) Among the rules are to spell out
numbers.
> > For example,
> >
> > 1984 (Nineteen Eighty Four)
> > 1066 and all that (Ten Sixty Six)
> > 3001 (Three Thousand One)
> > 2050 (Twenty Fifty)
> > 2010 (Twenty Ten)
> > 2001, A Space Odyssey (Two Thousand One)
> >
> > Bell Labs invented a whole programming language, Snobol, to deal with
> telephone listing conversions, matches, and sorts. Many phone books sort
Mc-
> and Mac- together, others one after the other but separate from other
names.
> >
> > Edward Cherlin
> > Generalist
> > "A knot! Oh, do let me help to undo it."
> > Alice in Wonderland
> >
> >
>
>
>
>



This archive was generated by hypermail 2.1.2 : Thu Sep 13 2001 - 11:55:21 EDT