Re: Response to Everson Phoenician and why June 7?

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue May 25 2004 - 14:14:38 CDT

  • Next message: Dean Snyder: "Glyph Stance"

    Peter,

    > >> There is no consensus that this Phoenician proposal is necessary. I
    > >> and others have also put forward several mediating positions e.g.
    > >> separate encoding with compatibility decompositions
    > >
    > >
    > > Which was rejected by Ken for good technical reasons.
    >
    >
    > I don't remember any technical reasons, it was more a matter of "we
    > haven't done it this way before".

    The *reason* why we haven't done it this way before is because
    it would cause technical difficulties.

    Compatibility decompositions directly impact normalization.

    Cross-script equivalencing is done by transliteration algorithms,
    not by normalization algorithms.

    If you try to blur the boundary between those two by introducing
    compatibility decompositions to equate across separately encoded
    scripts, the net impact would be to screw up *both* normalization
    and transliteration by conflating the two. You
    would end up with confusion among both the implementers of
    such algorithms and the consumers of them.

    > But perhaps that is only because the
    > need to do this has not previously been identified.

    No, that is not the case.

    > However, I can make
    > a good case for the new Coptic letters being made compatibility
    > equivalent to Greek - which can still be done, presumably -

    But will not be done. If you attempted to make your case, you
    would soon discover that even *if* such cross-script equivalencing
    via compatibility decompositions were a good idea (which it isn't),
    you would end up with inconsistencies, because some of the Coptic
    letters would have decompositions and some could not (because they
    are already in the standard without decompositions). You'd end
    up with a normalization nightmare (where some normalization
    forms would fold Coptic and Greek, and other normalization
    forms would not), while not having a transliteration solution.

    The UTC would, I predict, reject such a proposal out of hand.

    > as well as
    > for similar equivalences for scripts like Gothic and Old Italic, and
    > perhaps Indic scripts - which presumably cannot now be added for
    > stability reasons.

    Correct.

    > >> and with interleaved collation,
    > >
    > >
    > > Which was rejected for the default template (and would go against the
    > > practices already in place in the default template) but is available
    > > to you in your tailorings.
    >
    > Again, a matter of "we haven't done it this way before".

    I don't like the notion of interleaving in the default weighting
    table, and have spoken against it, but as John Cowan has pointed
    out, it is at least feasible. It doesn't have the ridiculousness
    factor of the compatibility decomposition approach.

    > >> also encoding as variation sequences,
    > >
    > >
    > > Which was rejected by Ken and others for good technical reasons, not
    > > the least of which was the p%r%e%p%o%s%t%e%r%o%u%s%n%e%s%s% of
    > > interleaving Hebrew text in order to get Phoenician glyphs.
    >
    >
    > I don't like this one myself either.

    So can we please just drop it?

    > But I disagree on
    > *preposterousness*. You consider this preposterous because you
    > presuppose that these are entirely different scripts. Others consider it
    > preposterous *not* to interleave Phoenician and Hebrew because they
    > understand these to be glyph variants of the same script. For, as John
    > Hudson has put it so clearly, for these people Phoenician and Hebrew
    > letters are the same abstract characters, in different representations.

    This is just restating the basic disagreement, for the umpteenth time.

    > It is clear to me that Phoenician is *not* an entirely separate script.
    > It seems to me that it comes somewhere between being the same script and
    > being a separate one. (In other words, I don't entirely accept either of
    > the strong traditions of scholarship.) Therefore complete separation is
    > inappropriate, although I don't insist on complete unification.

    O.k., so far, so good...

    > So I am
    > looking for a technical solution which comes somewhere between these two
    > extremes, which officially recognises the one-to-one equivalence between
    > Phoenician and (a subset of) Hebrew while making a plain text
    > distinction possible for those who wish to make it.

    The technical solution for that is:

    A. Encode Phoenician as a separate script. (That accomplishes the
       second task, of making a plain text distinction possible.)
       
    B. Asserting in the *documentation* that there is a well-known
       one-to-one equivalence relationship between the letters of
       this (and other 22CWSA) and Hebrew letters -- including the
       publication of the mapping tables as proof of concept.

    People (up to and including OS manufacturers, if they so choose), can
    then make use of B in developing collation tables, search algorithms,
    transliterations, or other kinds of equivalencing.

    Where I get off, however, is in assuming that the recognition of
    an equivalence has to be *further* baked into some normative
    mechanism of the Unicode Standard itself. Attempting to force this
    into normative behavior via compatibility decompositions or
    variation sequences is likely to result in *worse* consequences,
    in my opinion. The proper way forward is simply an assertion
    (and publication) of appropriate equivalence relationships
    for particular fields, and then getting on with the task of
    programming accordingly.

    > > The technical solutions you have proposed have been inadequate.
    >
    >
    > Can you suggest one which is more adequate? Or in fact are you
    > determined to reject any solution, using doubtful technical arguments
    > against the details because you have failed to produce convincing
    > arguments against the principle?

    Michael is correct. But don't expect *him* to provide you with
    all the nitty-gritty dirt from *inside* the library, OS,
    database, and application vendors' code, because he isn't that
    level of implementer.

    It would be more appropriate to direct such questions to the
    people who actually write and implement normalization, collation,
    transliteration, folding, and other kinds of equivalencing
    operations in shipping software.

    > >... the issue of
    > >whether the 22 basic Semitic letters can also be represented in
    > >a Phoenician script or not pales to the minor molehill it actually
    > >is, in my opinion.

    >
    > Obviously a lot of people disagree with you on this one, Ken.

    Of course they do. Otherwise Dean wouldn't be harping on this
    point over and over.

    But I have *seen* mountains. The equivalencing problem for
    Hangul is a significant foothill, at least. The equivalencing and
    variation problem for Han is a genuine mountain range.

    The equivalencing of 22 Phoenician letters, one-to-one against
    Hebrew characters, where the mapping is completely known and
    uncontroversial, is a minor molehill.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue May 25 2004 - 14:18:16 CDT