From: jameskass@att.net
Date: Thu May 13 2004 - 17:09:52 CDT
Dean A. Snyders asks,
> Why make something we do all the time more difficult and non-standard,
> when what we do now works very well?
>
Please, one thing to remember about default collation is that
it's default. It's only there when no other instructions exist.
Another thing to remember about collation is that it's best
when tailorable.
Anyone wishing to sort anything will want to impose their
own rules on the sort, and anyone who has done this in the
past has already worked out a method for such imposition.
If you're making a library database, do you want "1984" to
sort under the digit "1", would you prefer that it be sorted
under "O" for "one", or would it be better if it sorted under
"N" for "nineteen"? If the database is for biblios rather than
books, you might prefer that the book title be sorted under
"M".
If someone keys in "nineteen eighty four" to a search box,
and you want them to be able to find "1984" in your database,
you will program for it.
If you want "Richard III" to match with "Richard the third",
a bit of extra work is required.
If it's your purpose to set up a Hebrew script/Hebrew language
database of Hebrew inscriptions, and the original script used
in the inscription is irrelevant for your purposes, and you are
importing data from multiple sources who may use alternate
encodings, you will 'normalize' the data upon import. In this
case 'normalize' would include converting the character set
if necessary, transliterating/transcribing to Hebrew characters
if necessary, stripping off points if they're present and not
wanted, and so on.
If you're importing data into a DSS Unicode database, and your
source is using Web Hebrew or another ASCII-masquerade, then
you're already performing normalization.
If you're importing data originally entered in visual order rather
than logical order, you're already normalizing.
If your database includes a field to indicate the original script,
here presuming that the original script is of some interest, and
you want to export something, you'll either export it as Hebrew
text, or you'll 'normalize' it back into the original script on export.
Either way, it's about as hard to program for as allowing for
differences in case, like "TROLL" vs. "troll". And, in either case,
it should be done by the tools and trivial to the users, although
any application which doesn't allow the user to set preferences
and make rules in such an instance is next to worthless.
Best regards,
James Kass
This archive was generated by hypermail 2.1.5 : Thu May 13 2004 - 17:26:25 CDT