Re: Combining marks with two letters

From: James Kass (thunder-bird@earthlink.net)
Date: Tue Feb 12 2008 - 03:53:58 CST

  • Next message: James Kass: "Re: Combining marks with two letters"

    Philippe Verdy wrote,

    > James Kass wrote:
    >> Making multiple zero-width zero-contour glyphs so that different
    >> mark classes can essentially be assigned to the same
    >> *character* strikes me as a beautiful hack.
    >
    > I don't understand something in your sentence: where are the multiple
    > zero-width zero-contour glyphs? I can't even find any one ...

    That's because they're invisible.

    > ...(not even CGJ that
    > is not a glyph and cannot be assigned any glyph, not even an empty one, in
    > normal mode, i.e. not visible controls mode).

    CGJ is assigned to a zero-contour zero-width glyph in fonts which
    support the CGJ character.

    >
    > If you speak about characters that are zero-width zero-contours and that can
    > have multiple mark classes, I can't find any one.

    Zero width means it has no width at all, so the cursor/caret/pen doesn't move,
    and zero contour means that there is no outline data. Put zero width together
    with zero contour, and you get an invisible glyph. It's hard to find that
    which can't be seen.

    > CGJ, as a character has a
    > single combining class (i.e. zero), and there are no other invisible
    > combining marks (just marks that may become "invisible" by forming ligatures
    > or by modifying the glyph form of a base character or grapheme cluster such
    > as viramas/halants).
    >
    > Your sentence makes little sence for me.

    That's because what John was originally discussing involves OpenType
    tweaks. It's a bit complicated because the jargon used in OpenTypograhy
    is specialized.

    John Hudson had written about using OpenType contextual substitution
    to substitute alternate zero-width no-contour glyphs for the default
    zero-width no-contour glyph mapped to CGJ. Additional zero-width
    no-contour glyphs, contextually substituted, would, of necessity, all
    have different GIDs. This means that they could be assigned different
    classes in the font. Assigning them different classes in the font
    means that you could have CGJ getting looked up as an "above mark" in
    one place, while having CGJ get looked up as a "below mark" in another
    place. It's a beautiful hack.

    GID = glyph ID, the ordered position of a glyph's data in the font
    GPOS = glyph positioning, handles placement of one glyph relative to another one
    GSUB = glyph substitution, substitutes a new glyph for a string of glyphs.
    (A string of glyphs can be just one glyph.)

    > We are speaking about the effect of a single character (CGJ) that is
    > combining and always invisible by itself (it does not change the base letter
    > or any combining character encoded before or after it, it just controls
    > their relative ordering in the final layout, when multiple alternatives
    > would be otherwise possible and the encoding prefers/dictates the logical
    > ordering produced by the canonical order. Once the grapheme clusters have
    > been delimited, then the canonical ordering has been computed to each
    > grapheme cluster by the renderer, then BiDi ordering applied to the
    > graphemes, then CGJ dropped, no further reordering is allowed and fonts will
    > just select the appropriate glyphs in the specified order by transforming
    > sequences of characters to streams of glyph ids (with some complication for
    > two-part combining vowels that have parts with different positioning classes
    > and that require initial transformation), then combining them with GSUB,
    > mostly using pairs, then positioning them with GPOS or mark-to-mark
    > positioning.

    1. mark-to-mark is one of the GPOS features.
    2. If the CGJ function is only to prevent canonical re-ordering, then there
    would never be a need for a rendering engine to use it as part of a font
    look-up.
    3. T.U.S. 5.0 pages 540-541 does say that CGJ only blocks re-ordering.
    4. I would have figured out what the character is supposed to do a lot
    quicker if it had been named COMBINING RE-ORDERING BLOCKER.

    > Most of the preliminary steps that are best handled by the renderer itself,
    > will not need to be specified in fonts, and this includes all the question
    > related to the conversion from the logicial ordering to the visual ordering
    > (there will be a few known exceptions for some combining characters whose
    > glyph form and positioning class vary according to the base character, and
    > for which GSUB lookup will be really needed).

    The philosophy of OpenType is that things which are specific to the script
    are handled by the font-engine (for example, vowel sign re-ordering), and
    things which are specific to the font are handled by the font (for example,
    the inclusion of ligatures and how/where in the font to find them).

    Best regards,

    James Kass



    This archive was generated by hypermail 2.1.5 : Tue Feb 12 2008 - 03:59:15 CST