Re: data for collation tests

From: Michael Everson (everson@indigo.ie)
Date: Fri Feb 07 1997 - 09:06:09 EST


>Mark Davis writes:
>
>> 1. Apparently, as opposed to English, Danish sorts space and hyphen as
>> separate characters, not as ignorable secondaries (e.g. ignored on first
>> pass). For example, in English one sorts as in the following:
>>
>> black
>> black-and-blue
>> black and white
>> blackbird
>> black bird
>> black-bird
>> blackbirds
>> black birds
>> black-birds
>> blackbox
>> black-eyed pea
>> blackfish
>> black lung
>>
>> and NOT as:
>>
>> black and white
>> black bird
>> black birds
>> black lung
>> black-and-blue
>> black-bird
>> black-birds
>> black-eyed pea
>> black
>> blackbird
>> blackbirds
>> blackbox
>> blackfish

Letter-by-letter sorting is used in Danish; word-by-word sorting is used in
English. ISO CD 14651 specifies a toggle in the default sort since you can
never tell, really, what people will want. (Alain LaBonté doesn't like
word-by-word sorting; most OS's use it however.)

>> 2. In English dictionaries as a rule, uppercase comes before lowercase,
>> as in:
>>
>> polish
>> Polish

You've got your terms wrong, but I would dispute this. AaBbCcDdEeFfGg is
normal for English (remember the alphabet chart in your classroom, Mark?);
aAbBcCdDeEfFgG is not. I would consider

August
august
God
god
May
may
Polish
polish

to be correct in English. See
http://www.indigo.ie/egt/standards/capsmall.html for a discussion of the
issues. ISO CD 14561 specifies a toggle in the default sort since it is
impossible to please everyone with Aa or with aA even within a single
language (like English).

bEST REGARDS,

--
Michael Everson, Everson Gunn Teoranta
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire (Ireland)
Gutháin:  +353 1 478-2597, +353 1 283-9396
http://www.indigo.ie/egt
27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT