Re: (TC304WG1.241) data for collation tests

From: Michael Everson (everson@indigo.ie)
Date: Sat Feb 08 1997 - 05:08:48 EST


At 16:17 -0500 1997-02-07, Alain LaBont/e'/ wrote:
>
>It would be preferable in information technology, as far as sorts are
>concerned, to use the term "field", which is traditional in computer
>programming. "Cultural sort" of a field should not care about spaces
>(or it should, if one really means it, a special space should then be
>used for this as in Canadians standard CAN/CSA Z243.4.1).

No, no, Alain, this is exactly what we never agree on, which is why we have
the toggle. It is equally valid for me to say: "Cultural sort" of a field
should care about spaces, (or it should, if one really means it, a special
space should then be used for this as in the Mac OS and Windows.

This is why we have the toggle, folks.

>If a space is a delimiter of two
>entities that you call a "set phrase", then another field should be used
>instead of using an artificial delimiter.

Spaces separate entities in plain text. Other field separators, in
databases and the like, are commas, tabs, and semicolons.

>But in TC37 terminology, the expressions "word by word" or "character by
>character" terminology is wrong as far as actual understanding of what is
>going on is taken care of. I can affirm you that what they call "word by
>word" is more "character by character" than the other method.

The use of the terms word-by-word and letter-by-letter (John Clews can
probably tell us) goes back a long way in libraries with paper cards, I
suspect. Certainly it did not originate with TC37 or Gavaré.

>It is a pure
>positional, character by character sort, where even spaces are counted as
>characters, while the other method ignores these characters at the first
>level of comparison, as do human beings when they search in a dictionary
>(telephone book directories sort by fields, firt names first, second names
>after, and within a name, sort is done as it would be done in a dictionary).
>So their (TC37's) terminology has to be changed. I've been saying this for
>years, but it does not seem to be agreed upon by ISO/TC37ers. With what
>they're doing, nobody is going to retrieve me in a telephone book! They
>should revisit their method and they will see that not all spaces were
>created equal.

I don't agree with you either, Alain. You and I have been disagreeing on
this for years.... :-) TC37 isn't going to change its terminology, I
suspect, because its terminology is correct.

In general a telephone directory tells you what to do in its front matter.
In the Maldives, the houses are listed in the telephone directory by name,
you look up the house and ring it and then ask for the person you want to
speak to. I guess nobody is going to retrieve you in a Maldivian telephone
book no matter what you do with SP and NBSP. (Don't these cultural tidbits
make it all worthwhile?)

Very best regards,

--
Michael Everson, Everson Gunn Teoranta
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire (Ireland)
Gutháin:  +353 1 478-2597, +353 1 283-9396
http://www.indigo.ie/egt
27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT