Re: Medievalist ligature character in the PUA

From: verdy_p (verdy_p@wanadoo.fr)
Date: Tue Dec 15 2009 - 06:51:40 CST

  • Next message: Doug Ewell: "Re: Medievalist ligature character in the PUA"

    "Jeroen Ruigrok van der Werven" wrote:
    > Actually ij is unbreakable from a language point of view. You cannot
    > hyphenate any words using it like blijdschap into bli-jdschap. I think the
    > Dutch problem of using ij/IJ/y/Y for the ij comes from the fact we have been
    > using US English keyboards for a long time now.

    Wasn't it "ÿ/Y" rather than "y/Y" (keeping the two soft dots on the lowercase letters part of that ligature, as they
    are present in the handwritten form) ?

    And isn't there also "Ÿ" due to the default case mapping of "ÿ" in Unicode (thus transforming the two soft dots into
    hard dots), despite "Ÿ" was not available in ISO 8859-1?

    And wasn't there a recent orthographic reform in Belgium or the Netherlands to allow writing words without the
    ligature (even if there's no hyphenation, that ZWJ would anyway prohibit if it is Dutch-aware, so i+ZWJ+j could
    still be perfect for Dutch, suggesting that this is an optional ligature)

    Are there cases in some languages, where letters linked with ZWJ to form a suggested ligature can still be separated
    by hyphenation? Yes:

    In the purely typographical ligatures such as "ff"/"ffl"/"ffi" where the suggested ligature between the two "f" is
    still possible, and even in the case of "ffi"/"fi"/"ffl"/"fl" where an syllable break is still possible after the
    last "f" (in English or German, but not in French), or "ct"/"st"/"ſt" where syllable breaks exist in most latinized
    languages.

    There are other common ligatures: "ſl"/"ſi".

    In handwritten forms or in cursive typography, the links also have contextual positions and can even alter the glyph
    of the preceding or following letters (notably with lowercase letters that have descenders, but also some pairs like
    "bi" or "oi" or "ol", where the "i" or "l" connects on the x-height line instead of the baseline). These are
    generally not real ligatures, but contextual forms (mostly like in Arabic where they are required), some of them
    depending on whever that letters are initial or not, others depending whever they are final or not, and some
    depending on specific pairs.

    ZWJ is then not an indicator of word hyphenation positions, but this is also true for all other letters. These
    positions are simply not encoded in Unicode: by default, if you don't know the language precisely, there's no
    hyphenation at all in the middle of words, except if you use the soft hyphenation (SHY) control explicitly. For
    effectively applying hyphenation you need to know the language, its general rules, and a dictionnary for lists of
    exceptions:

    The presence or absence of ZWJ in ligatures (or of ZWNJ in non-ligatures) will have no effect (hyphenators should
    ignore ZWJ and ZWNJ when they are present, and MUST effectively drop them before rendering lines if an hyphenation
    occurs at their position.

    This just demonstrate that it's up to renderers to manage ZWJ and ZWNJ correctly, and not up to fonts to include
    support for them. Fonts on the opposite MAY map glyphs for them, which will be displayed only when using the "show
    hidden controls" mode (in which ligatures suggested by ZWJ should probably NOT be honored).

    When this mode is off, none of these "visible control" glyphs should ever be displayed, even if they are mapped in
    the main mapping table (codepoint to glyphID), and renderers should just use or not use the ligatures found in the
    font's "ligature" tables (which may also be tuned for some languages where they are forbidden): in OpenType, this
    means using font "features" and putting all substitution/positioning rules within the features table.

    But anyway, I still think that it's best to use the glyphs mapped in the font for ZWJ and ZWNJ, instead of a generic
    glyph from a default internal font of the renderer, notably because the glyph forced by the renderer could have very
    bad metrics, not aligning correctly (possibly contextually) with the other glyphs coming from the font: a font may
    then still need to use substitution/positioning rules to properly position (and possibly reorder...) the control
    glyph within the text flow, when the "visible controls" mode is enabled in the renderer.

    Philippe.



    This archive was generated by hypermail 2.1.5 : Tue Dec 15 2009 - 06:55:00 CST