From: Mike Ayers (mike.ayers@tumbleweed.com)
Date: Thu May 13 2004 - 15:04:14 CDT
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
> Behalf Of Dean Snyder
> Sent: Thursday, May 13, 2004 10:36 AM
> Rich Gillam of Language Analysis Systems, Inc. Unicode list
> reader wrote
> at 11:41 AM on Thursday, May 13, 2004:
> ...
> >That's how we got here. The effect it has on sorted lists of words
> >seems pretty uninteresting to me. I can think of two use cases:
> >
> >1. A sorted list of Phoenician words (or words using the Phoenicial
> >script range, in whatever language or script) that mixes encoding
> >conventions-- some words use the Phoenician script range and some use
> >the existing Hebrew range. Same letters, same glyphs, different
> >underlying encoding. You want to hide the difference in underlying
> >encoding from the end user.
> >
> >2. A sorted list of Hebrew words, some in modern Hebrew
> script and some
> >in Paleo-Hebrew (or some other script that uses the
> Phoenician range).
> >Same language, different glyphs.
> >
> >Both are justification for an interleaved sort order,
No. Both are situations where the data should be normalized before
sorting. In the first case, convert the data into a single encoding
convention. In the second case, convert all the non-Hebrew data to Hebrew.
Then sort away.
> > but really, how
> >often will either use case come up?
>
> Well, for just one case, if you're a Dead Sea scroll scholar
> (one of the
> more populated sub-disciplines in Semitic scholarship) all
> the time and
> every day.
You create daily sorts on the same data? Since I doubt that you are
expecting new words to show up in there, I think that this must mean that
you are sorting different sets of the existing data, yes? For such a case,
just resort the prenormalized data.
> >Do you really expect-- in EITHER
> >case-- to have long lists of words that need to be
> mechanically sorted?
>
> Yes.
Normalization makes for faster sorting than interfiling.
> >Do you expect it to happen often enough that hacking together a Perl
> >script to do it once isn't going to get the job done?
>
> Yes.
One normalization script could be used any number of times. Clip,
normalize, sort - repeat as necessary.
> >Why is this a
> >burning issue that has to be enshrined in the default UCA sort order?
>
> [Or even a separate encoding for that matter?] Because of what lies
> behind the responses to your questions above.
I see no substance in your answers so far. Please clarify.
/|/|ike
This archive was generated by hypermail 2.1.5 : Thu May 13 2004 - 15:05:25 CDT