RE: Phetsarat font, Lao unicode

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Jul 11 2007 - 09:55:09 CDT

  • Next message: Aiet Kolkhi: "Re: that font-letter spacing problem - MS Word issue?"

    James Kass wrote:
    > Envoyé : mardi 10 juillet 2007 05:16
    > À : unicode@unicode.org
    > Objet : RE: Phetsarat font, Lao unicode
    >
    >
    > Philippe Verdy wrote,
    >
    > > One problem is that fonts (at least with TrueType/OpenType) are not
    > designed
    > > to support reordering and positioning with an unbound number of base
    > > characters.
    >
    > Font engines handle reordering.

    Not completely, and not always. Some fonts do have to use GSUB for local
    reordering according to style rather than just the script properties.

    > > For example the GSUB/GPOS tables in TrueType require listing
    > > somewhere the complete list of codepoints where such reordering and
    > > positioning may be applied, ...
    >
    > A listing of glyph IDs is stored in the font. Fonts only store
    > codepoints in the "cmap" table. The listing of glyph IDs may
    > be a complete list of every glyph ID involved, or it may be
    > done using ranges in order to minimize table size.
    > > ... something that can't be performed in fonts with
    > > the current format, because they don't allow defining character classes
    > > in them,
    >
    > The OpenType GDEF table format requires assignment of
    > glyphs to various character classes. These classes are neither
    > user- nor developer-definable, though. Unicode also assigns
    > character classes, but only to characters. Complex script
    > fonts generally have scads of "presentation form" glyphs
    > which aren't characters in the Unicode sense.

    I said "somewhere". You misunderstand what I mean here. I was speaking about
    the possibility of creating a group of code points (even if they are
    remapped internally to glyph ids within a "cmap" table or other tables) and
    assigning them with a single identifier that can be used in GSUB/GPOS rules
    tables; without it, you'll have to create asmany rules as there are in the
    product of possible base characters in one class, and of possible combining
    vowel signs in another class. As there may exist lots of candidate base
    characters to which such combination will be needed, this will rapidly
    exhaust the maximum size allowed for such GSUB/GPOS tables.
    Creating GSUB/GPOS tables so that their selector can include a pseudo-glyph
    id mapped to a class of codepoints wouldsimplify the design a lot for fonts
    that need to contain lots of characters (possibly from several or many
    scripts);

    > > ... and assigning them pseudo-glyph IDs that can be used in GSUB tables.
    >
    > Pseudo-glyph ID might be a misleading phrase. A Glyph ID is
    > simply the number of the position of a glyph's data in a font.
    > The first glyph, contrary to conventional counting methods,
    > is given the glyph ID of zero. And so forth.

    It was not misleading. I really intended a special id that can be used to
    designate a class of glyphs (mapped from a class of characters) as if it was
    a single glyph id, to create a single composition rule, instead of having
    one composition rule per result of the product of the two classes. It would
    certainly be more useful in GPOS than in GSUB.

    > > ... the renderer
    > > for example could be looking for rules based on the dotted circle
    > symbol,
    > > and automatically infer the other applicable rules for other Common
    > symbols,
    >
    > Does this assume that the dotted circle is part of the encoded text?

    Yes, for the intended purpose of showing the diacritic isolately, with an
    arbitrary base symbol.

    > It normally isn't, it's inserted (to the display only) by (at least) one
    > popular font engine.

    Not. A renderer should not have to do this unless explicitly instructed to
    do so, or if there's no other way to display the diacritic in combination
    with a associatable base character.

    But even in the case of, for example, a combining cedilla occurring after a
    base Hebrew letter, for which it is very unlikely that a font would
    implement a composition rule, and for which the renderer will be of no help,
    displaying the uncomposable combining cedilla with a dotted circle is not
    the ultimate solution. Many renderers will instead attempt to use some
    default reasonable positioning, for example by centering the diacritic
    horizontally with the center of the base letter (the renderer will probably
    not be able to move the cedilla vertically, or fond a more appropriate
    place, given that it would depend on the exact style of each base glyph,
    which does not necessarily specify attachment points for general Latin
    diacritics)

    > Regardless, other symbols will most always
    > have completely different metrics. It's unlikely that a font engine
    > will calculate the different heights, advance widths, and so forth,
    > in order to approximate a correct placement of the combining
    > character glyph. It's probably equally unlikely that a font developer
    > will add a potentially infinite number of GPOS rules to a font's tables
    > in order to accomplish this with every conceivable arbitrary base
    > character glyph.

    For notational purpose, this is still what renderers are doing when
    positioning diacritics with a dotted circle, given that fonts themselves are
    not specifying such advanced positioning (or substitution for resizing)

    What is the difference between positioning a diacritic (like a Lao vowel)
    with a base dotted circle, and positioning the same diaciitic with another
    base symbol like a cross (or something else like a circle, square, dotted
    square, crossed hatch, horizontal stroke, or checkers grid)? I've seen
    various symbols used to denote the absence of a specific base letter in
    Latin-written texts. Why would not this exist too for Lao?

    Is the proposer of the x-like cross sure that this convention is not
    arbitrary and specific to some authors? What is clear is that the chosen
    symbol should not be confusable with another existing letter (that's why
    choosing a simple circle or a cross was not appropriate as the base symbol
    for denoting the position of a base Latin or Greek or Cyrilic letter).

    But could the Unicode convention of using a dotted circle for such
    notational use the best option for all scripts? Isn't there a script where a
    dotted circle character gets another semantic than just a pure symbolic
    graphical feature, so that the conventional dotted circle could become
    confusable in that script? I have not seen something in Unicode that says
    that using a dotted circle for this case is normative, and this is a good
    reason for not implementing this feature within fonts, but only in Renderers
    that have better knowledge of the context of use, to see if it really needs
    to display that symbol, and which symbolic glyph will be the most
    appropriate.



    This archive was generated by hypermail 2.1.5 : Wed Jul 11 2007 - 09:57:57 CDT