RE: interleaved ordering (was RE: Phoenician)

From: Mike Ayers (mike.ayers@tumbleweed.com)
Date: Thu May 13 2004 - 15:04:14 CDT

  • Next message: Peter Constable: "RE: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))"

    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
    > Behalf Of Dean Snyder
    > Sent: Thursday, May 13, 2004 10:36 AM

    > Rich Gillam of Language Analysis Systems, Inc. Unicode list
    > reader wrote
    > at 11:41 AM on Thursday, May 13, 2004:

    > ...
    > >That's how we got here. The effect it has on sorted lists of words
    > >seems pretty uninteresting to me. I can think of two use cases:
    > >
    > >1. A sorted list of Phoenician words (or words using the Phoenicial
    > >script range, in whatever language or script) that mixes encoding
    > >conventions-- some words use the Phoenician script range and some use
    > >the existing Hebrew range. Same letters, same glyphs, different
    > >underlying encoding. You want to hide the difference in underlying
    > >encoding from the end user.
    > >
    > >2. A sorted list of Hebrew words, some in modern Hebrew
    > script and some
    > >in Paleo-Hebrew (or some other script that uses the
    > Phoenician range).
    > >Same language, different glyphs.
    > >
    > >Both are justification for an interleaved sort order,

            No. Both are situations where the data should be normalized before
    sorting. In the first case, convert the data into a single encoding
    convention. In the second case, convert all the non-Hebrew data to Hebrew.
    Then sort away.

    > > but really, how
    > >often will either use case come up?
    >
    > Well, for just one case, if you're a Dead Sea scroll scholar
    > (one of the
    > more populated sub-disciplines in Semitic scholarship) all
    > the time and
    > every day.

            You create daily sorts on the same data? Since I doubt that you are
    expecting new words to show up in there, I think that this must mean that
    you are sorting different sets of the existing data, yes? For such a case,
    just resort the prenormalized data.

    > >Do you really expect-- in EITHER
    > >case-- to have long lists of words that need to be
    > mechanically sorted?
    >
    > Yes.

            Normalization makes for faster sorting than interfiling.

    > >Do you expect it to happen often enough that hacking together a Perl
    > >script to do it once isn't going to get the job done?
    >
    > Yes.

            One normalization script could be used any number of times. Clip,
    normalize, sort - repeat as necessary.

    > >Why is this a
    > >burning issue that has to be enshrined in the default UCA sort order?
    >
    > [Or even a separate encoding for that matter?] Because of what lies
    > behind the responses to your questions above.

            I see no substance in your answers so far. Please clarify.

    /|/|ike



    This archive was generated by hypermail 2.1.5 : Thu May 13 2004 - 15:05:25 CDT