Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Jan 26 2007 - 11:59:05 CST

  • Next message: Asmus Freytag: "Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts"

    >
    > Now let's say that I have a text in typical modern German, which I
    > decide I want to display in blackletter type (noting your accurate
    > objection to use of the specific term fraktur). What degree of this
    > conversion should I be able to rely on to be automated, and what
    > degree will require editorial intervention *in the text*? I don't know
    > the answer to that question, and I suspect it is something that could
    > generate a good deal of debate.
    >
    The rules for the use of long s, and for ligatures (in German), both
    require that you know the word boundaries inside a compound word. As has
    been demonstrated on this list many times, there are cases where even
    dictionary-based approaches must fail, because the same string of
    letters can represent two different compound words (with different
    location of the boundary).

    Further, the use of Antiqua for "foreign" words holds more firmly than a
    similar rule for use of italics in modern English for the same purpose.
    This requires more dictionary-based support, and is complicated by the
    fact that it does not extend to names.

    A spell checker could help you with the long/short s, but unless you use
    *character codes* to prohibit the ligatures, and made sure that your
    interface to the spell checker does not suppress ZWNJ, it would be
    little help in sorting out the prohibited ligatures.

    If you had a system that tapped into the hyphenation data (modified to
    distinguish component words as well as mere syllable boundaries),to
    disable ligature formation for at least the default word components,
    then your task of text intervention could be reduced to the exceptional
    cases. However, note that this requires an interaction between
    components that is absent from current architectures, and it requires
    that dictionary-based data be tweaked, at least, for the purpose. Both
    non-trivial. (In theory, the ligature problem exists in modern texts as
    well, albeit to a lesser degree due to fewer ligature pairs in antiqua
    fonts, but is generally totally unsupported).

    An ordinary spell checker would have troubles with the foreign words
    convention, since spell checkers usually don't interact with font markup.

    There's no debate that the amount of text intervention would be
    considerable, that there are definite limits to what you can do (or
    assist the user with) by software, and that doing even that would
    require considerable modifications/adjustments to existing architectures
    and dictionary data.

    A./



    This archive was generated by hypermail 2.1.5 : Fri Jan 26 2007 - 12:02:39 CST