Collating caps and smalls

From: Michael Everson (everson@indigo.ie)
Date: Sat Feb 08 1997 - 04:40:26 EST


Re: http://www.indigo.ie/egt/standards/capsmall.html

At 08:40 -0800 1997-02-07, Mark Davis wrote:
>
>The examples you cite in your paper seem to have little to do with
>ordering of words.

You're right, Mark. The exercise was primarily to determine whether there
were *any* criteria for preferring capitals-before-smalls or
smalls-before-capitals in with regard to ranking.

>1. Outlines...
>>From your principle, you you would sort I<II<...A<B...<1<2...<a<b...
>I don't know about Ireland, but in the US (or at least California; can't
>speak for Easterners) we would not sort digits between uppercase and
>lowercase.

The outline format was not intended to be a guide to sorting; rather an
example of how, in a vertical list such as that, the capital letters do
precede the small ones. People are used to that -- and I don't know a
counter example. (You can't use the dictionary example because I have so
many dictionaries with AaZz and aAzZ that such data doesn't help.)

>> "order of honour" ... "feeling" ... August should precede august
>This is rather circular. Lots of people have "feelings", and not all of
>these are the same as yours.

Actually the "order of honour" was suggested to me by someone else, and it
offers a rationale for choosing one AaZz over aAzZ. That's all.

>> We write Aachen. We don't write aAchen.
>I'm puzzled. What does the fact that we capitalize the first letter
>within a word have to do with the order of words?

Again, the exercise was to investigate whether *any* kind of "naturalness"
obtained with regard to the question. Is it just me? I really see a
difference between AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZzŜŝ
and aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZŝŜ whether in a
list horizontal or vertical....

>> Historically, CAPITAL LETTERS existed before the small letters.
>I take it you would then sort Z<J<W!
>We do not, in general sort each letter by its historical introduction.

The exercise was with regard to the classes "CAPITAL" and "SMALL" not with
regard to the individual letters.

>Myself, I don't care either way. The only real issue is what people's
>expectations are for real sorted list; that is established by looking at
>live examples (dictionaries, phonebooks, etc.).

That was the problem. Those sources, even within a given language, often
disagree. Some sources ignore case entirely. The reason I undertook the
exercise was that in TC304 we are making a default, multilingual European
locale (to be tailored locally) and this was an unresolved issue.

Alain will say that most people don't have any expectations at all. The
only rationale I have ever heard for smalls-before-capitals was that "small
letters occur more frequently and are therefore more general".

Other interesting facts are that Greek and Cyrillic are sorted
caps-before-smalls. And in the tables in 10646, for the overwhelming
majority of cased pairs (whether contiguous or not) capitals precede smalls.

>The COED is certainly
>strong evidence for one direction; most of the other dictionaries we
>have consulted use the other (lowercase before uppercase).

Some years ago I went through all my dictionaries. It was something like
50% AaZz, 40% aAzZ, 10% aAaZzZ....

>In the end, ideally you would have a toggle, as ISO CD 14561 specifies.

Et voilà.

--
Michael Everson, Everson Gunn Teoranta
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire (Ireland)
Gutháin:  +353 1 478-2597, +353 1 283-9396
http://www.indigo.ie/egt
27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT