Re: Confusion about weak and strong disunification

From: verdy_p (verdy_p@wanadoo.fr)
Date: Mon Aug 17 2009 - 05:56:37 CDT

  • Next message: verdy_p: "Re: Greek characters in IPA usage"

    "Asmus Freytag" wrote:
    > For Cyrillic, many character sets exist (and have existed for a long
    > time, even prior to Unicode) that contain _both_ the Latin alphabet and
    > the Cyrillic alphabet. The shape "a" occurs in both alphabets, and has
    > been encoded using two character codes. On the other hand the shape "z"
    > is thought to occur only in one alphabet (the Latin) and is coded only
    > once. If some not-so-well-known language has been written in Cyrillic,
    > but using the "z" shape, all digitally encoded documents created would
    > have to have used the "z" shape with the character code in the Latin
    > alphabet section of those character sets.

    A more convincing strong disunification is between Greek and Latin capital letters A and ALPHA:
    * despite they look similar, their behavior when they are used with combining diacritics are different (but not
    different when they are not really capitalized, i.e. when they actually are small letters, just rendered in small
    capitals, where breathing/softening/hardening diacritics can still be stacked above them, instead of before them
    with preferable kerning)
    * when capitals are used and encoded undistinctly from small letters, the conversion of capitals to small letters
    (eventually in small capitals) is generally ambiguous in word-initial positions, when no dictionary lookup is
    available, in which case title-casing may still be preferable to avoid erroneous small letters. This is not specific
    to Greek as it occurs also in Latin and Cyrillic, but the effect of the effective conversion to small letters
    (including small capitals) changes the rendering position of combining diacritcs in Greek.

    And for rendering non-modern Greek, there's also the possible ambiguity about the conversion of capital iotas to
    subscript iotas when they occur after a vowel:
    * this is generally the case, except when iot as are the initial of a radical and is to be spelled distinctly:
    * this is clearly the case where another specific disunification is needed for that iota subscript, that is
    unfortunately not preserved when converting to capitals.
    * for me this is a case of a "hard" disunification because it is used to exhibit effectively contrasting phonetic
    and morphologic differences.

    Similar issues are occur where a disunification is needed in some case pairs (like sigma letters) between initial,
    middle or final forms: converting small letters to capitals looses these distinctions. However it is not really
    clear that this creates a contrasting phonetic or morphemic difference, in the case of letter sigma. This is for me
    a "soft" disunification as the difference is principally a difference of glyphs, where the typographic rules used to
    choose between them is not very accurate without a dictionary lookup (there are case in Greek where final forms
    still need to be rendered rather than initial/medial forms, despite the sigmas are occuring in the middle of a word.
    (We could also discuss about the alternate letter forms for Greek small letters phi and theta)

    The case is much less clear (is it "a soft" or "hard" disunification?) with the "similar" case of the Latin "long s"
    in non modern languages, because of their use in Germanic languages: in Old French and English, genrally these
    differences were just recommendations, not always observed even by the best scribes and authors. In Germanic
    languages however, the fact that the "sharp s" form (normally used for initial/medial positions and not for final
    positions) can create ligatures with distinct semantics (such as the "sharp s" whose meaning becomes ambiguous
    between "ss" and "sz") may be fully resolved though the explicit encoding of ligature controls, in which case the
    automatic conversion of all initial or middial letter s to the long form (including with its ligatures) becomes
    possible.

    But as the use of ligature controls (joiners/disjoiners) are just hints for renderers, not always implemented by
    them which can safely ignore them as if they were not present, and not systematically encoded where appropriate in
    most texts, and given that the "rules" governing their correct use is language-dependant, it becomes clear that the
    disunification is needed. Unfortunately, here also, the distinction between initial/medial "long s" and the "normal"
    s that should always be used in final positions (including positions before explicit ligature disjoiners) is lost
    after conversion to capitals, and cannot be restored safely, except when using explicit ligature controls or when
    the effective language used is known and this language is permissive (such as Old English, and old Romance
    languages) and does not need this distinction for correct rendering and interpretation.



    This archive was generated by hypermail 2.1.5 : Mon Aug 17 2009 - 05:59:27 CDT