Re: DIN 5007, Swiss Sorting

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Sun Mar 12 2000 - 20:02:09 EST


Michael Everson wrote on 2000-03-12 15:24 UTC:
> Ar 20:32 -0800 2000-03-11, scríobh Alain:
> >Michael Everson disagrees that for English lower case should be sorted
> >first in case of quasi homographs (ex. : august before August), based on
> >what he deduces from the short version of the OED.
>
> I don't just disagree or deduce, I gave actual evidence. The "short
> version" Alain refers to is the Concise Oxford Dictionary of Current
> English. See http://www.egt.ie/standards/iso10646/pdf/n688.pdf for a review
> of dictionaries.

What troubles me with all the work on the international sorting standard
is that far too much emphasis is given on so-called "existing practice".
Let's face it: there is no such thing. Extremely few people know or
agree about the algorithmic details of the traditional sorting order in
their respective locale. Very few countries have detailed formal
standards (such as the German DIN 5007), and even in these countries
these standards are so little known that lots of modifications could be
made without anyone noticing. "Anyone" includes experts such as
dictionary publishers, who tend to be not less confused than the
ordinary person.

The aim of the UCS sorting standard should be

  - user friendliness
  - easy to remember
  - helpful in manually locating words in huge sorted lists
  - practical
  - efficiently implementable
  - consistent and simple across may different languages and scripts

and alternatives should please be discussed in these terms and not in
terms of compatibility with this or that dictionary.

Compatibility with the precise details of existing national standards is
completely irrelevant, unless more than 0.1 % of the population in the
respective locale are actually familiar with this practice.

E.g., I have asked over a dozen French people (academics and frequent
users of dictionaries) about the in my eyes very unexpected idea of
sorting French accents in reverse order (last character most
significant), and I have not yet found anyone (excect i18n experts who
followed the discussion here) who knew about these rules. I have not yet
found a single person ever who was able to explain to me, why it is a
good idea to sort French accents in reverse at all. Is there any, apart
from that some dictionaries seem to do it?

Perhaps, it turns out in the end to just have originated as a
programming error in the software of some dictionary publisher, and now
this is kept on forever without ever being questioned again.

Again: if you argue for A < a versus a < A, then please explain me why
one is better than the other, and please do so independent of existing
practice.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT