Re: [OT] o-circumflex

From: Mark Davis (mark@macchiato.com)
Date: Sat Sep 08 2001 - 00:04:07 EDT


I disagree. What you want is a merged database field. See
http://www.macchiato.com/slides/icu_collation.ppt

Mark
—————

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Όμήρου Μαργίτῃ
[http://www.macchiato.com]
----- Original Message -----
From: "Asmus Freytag" <asmusf@ix.netcom.com>
To: "David Gallardo" <dgallardo@mediaone.net>; "Ayers, Mike"
<Mike_Ayers@bmc.com>; "'David Starner'" <dstarner98@aasaa.ofe.org>;
<unicode@unicode.org>
Sent: Friday, September 07, 2001 11:50
Subject: Re: [OT] o-circumflex

> At 01:06 PM 9/7/01 -0400, David Gallardo wrote:
> >As a practical matter, you need to take the diacritics into account when
> >sorting, even in English where they (may or may not) have linguistic
> >significance, otherwise you'll get nondeterministic behaviour. In other
> >words, résumé and resume should fall together, but always in the same
order.
>
> Stated absolutely, this is patent, but oft-repeated nonsense. For example,
> it does not always make sense for list of names. An old friend of mine,
Jon
> Proppe, who is an Icelandic art critic, spells his name with an accent
> grave on the first o and an acute accent on the e. In a campus directory
of
> the US university he attended (assuming it did not strip the accents), it
> would make no sense to have his name show up after all the Proppes, or all
> the Jons without an accent (depending on whether its sorted by first or
> last name).
>
> If I sort a list of single words which contains non-unique entries, a
> stable sort would sort the non-unique subsets in the order of their
> appearance in the input. If its not important to distinguish between naive
> and naïve (e.g. in a machine generated index that spans multiple documents
> with differences in the use of accents) its hard to see what's gained in
> splitting the list in two for this case.
>
> On the other hand, if San Jose and San José are correctly and consistently
> distinguished in my input, they should probably sort separately.
>
> The two cases of resume are different yet again, as noted, since one could
> be a verb form.
>
> It all depends not on whether a distinction can be made, but whether it is
> meaningful in the context of the list being sorted.
>
> A./
>
>
>
>
>
>



This archive was generated by hypermail 2.1.2 : Sat Sep 08 2001 - 00:48:25 EDT