Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?

From: Peter Kirk (peterkirk@qaya.org)
Date: Fri Jun 04 2004 - 16:21:52 CDT

  • Next message: Peter Kirk: "Revised Phoenician proposal"

    On 25/05/2004 12:14, Kenneth Whistler wrote:

    >Peter,
    >
    >
    >
    >>>>There is no consensus that this Phoenician proposal is necessary. I
    >>>>and others have also put forward several mediating positions e.g.
    >>>>separate encoding with compatibility decompositions
    >>>>
    >>>>
    >>>Which was rejected by Ken for good technical reasons.
    >>>
    >>>
    >>I don't remember any technical reasons, it was more a matter of "we
    >>haven't done it this way before".
    >>
    >>
    >
    >The *reason* why we haven't done it this way before is because
    >it would cause technical difficulties.
    >
    >

    I am revisiting this one because I realise now that Ken has been
    somewhat economical with the truth here. There ARE cases in which entire
    alphabets have been given compatibility decompositions to other
    alphabets. For example there are the Mathematical Alphanumeric Symbols,
    the Enclosed Alphanumerics, and the Fullwidth and Halfwidth Forms, as
    well as superscripts, subscripts, modifier letters etc. These symbols
    have these compatibility decompositions because they are not considered
    to form a separate script, but rather to be glyph variants of characters
    in Latin, Greek, Katakana etc script. Do these compatibility
    decompositions cause technical difficulties?

    >Compatibility decompositions directly impact normalization.
    >
    >

    Of course. And the point of suggesting compatibility decomposition here
    is precisely so that compatibility normalisation, as well as default
    collation, folds together Phoenician and Hebrew variant glyphs of the
    same script.

    >Cross-script equivalencing is done by transliteration algorithms,
    >not by normalization algorithms.
    >
    >

    This begs the question. Scholars of Semitic languages do not accept that
    this is a cross-script issue. They do not accept that representation of
    a Phoenician, palaeo-Hebrew etc inscription with square Hebrew glyphs is
    transliteration. Rather, for them it is a matter of replacing an
    obsolete or non-standard glyph by a modern standard glyph for the same
    character - just as one would not describe as transliteration
    representation in Times New Roman of a Latin script text in mediaeval
    handwriting or in Fraktur.

    >If you try to blur the boundary between those two by introducing
    >compatibility decompositions to equate across separately encoded
    >scripts, the net impact would be to screw up *both* normalization
    >and transliteration by conflating the two. You
    >would end up with confusion among both the implementers of
    >such algorithms and the consumers of them.
    >
    >
    >
    I would suggest that a clear distinction should be made, in an
    appropriate part of the Unicode Standard, between transliteration
    (between separate scripts) and what one might call glyph normalisation
    (between variant forms of the same script).

    >>But perhaps that is only because the
    >>need to do this has not previously been identified.
    >>
    >>
    >
    >No, that is not the case.
    >
    >
    >
    >>However, I can make
    >>a good case for the new Coptic letters being made compatibility
    >>equivalent to Greek - which can still be done, presumably -
    >>
    >>
    >
    >But will not be done. If you attempted to make your case, you
    >would soon discover that even *if* such cross-script equivalencing
    >via compatibility decompositions were a good idea (which it isn't),
    >you would end up with inconsistencies, because some of the Coptic
    >letters would have decompositions and some could not (because they
    >are already in the standard without decompositions). You'd end
    >up with a normalization nightmare (where some normalization
    >forms would fold Coptic and Greek, and other normalization
    >forms would not), while not having a transliteration solution.
    >
    >

    This is not intended as a transliteration solution. It is intended to
    recognise that *some* Coptic letters are glyph variants of Greek
    letters, as previously recognised by the UTC, whereas *others* are not.
    As a result only the former set would have compatibility decompositions
    - and as it happens those are precisely the ones which are proposed for
    new encoding, and so for which compatibility decompositions can still be
    defined. This also has the major advantage that it folds together, for
    normalisation and default collation, texts which have been encoded
    according to the existing definitions for Coptic and those which will be
    encoded according to the new definitions.

    But I accept that this Coptic to Greek compatibility has a few problems
    because not all characters have mappings. However, this is not a problem
    for Phoenician, because *every* Phoenician character has an unambiguous
    compatibility mapping to an existing Hebrew character.

    >...
    >I don't like the notion of interleaving in the default weighting
    >table, and have spoken against it, but as John Cowan has pointed
    >out, it is at least feasible. It doesn't have the ridiculousness
    >factor of the compatibility decomposition approach.
    >
    >

    If what I have suggested is ridiculous, so is what the UTC has already
    defined for Mathematical Alphanumeric Symbols.

    >...
    >The equivalencing of 22 Phoenician letters, one-to-one against
    >Hebrew characters, where the mapping is completely known and
    >uncontroversial, is a minor molehill.
    >
    >

    Well, why not make these uncontroversial equivalents, between variant
    glyphs for the same script, compatibility decompositions?

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Fri Jun 04 2004 - 16:22:58 CDT