Re: 28th IUC paper - Tamil Unicode New

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Aug 22 2005 - 17:20:11 CDT

  • Next message: John Hudson: "Re: Historical Cyrillic in Unicode"

    From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
    > Adam Twardoch wrote:
    >
    >> Richard Wordingham wrote:
    >>
    >>> By the way, why can't font-encoded Tamil (e.g. using ASCII codes as a
    >>> hack) display be handled on Windows by a GSUB table that handles the
    >>> re-ordering? Or would that make it Level-2 anyway? Where can I find a
    >>> definition of 'Level-2'?
    >>
    >> GSUB tables don't handle the reordering in Indic languages. It's the
    >> responsibility of the OpenType Layout processor, e.g. Uniscribe.

    How can an OpenType Layout processor correctly reorder glyphs, when all it
    knows from a font is the binding of single (but whole) codepoints to glyphs,
    and this does not work for characters that have composite glyphs that must
    be reordered separately, and that don't have individual codepoints assigned
    to each part?

    To work reliably, it would mean that the fonts have to be specially marked
    so that the glyphs associated to each part are assigned predictable PUA
    codepoints where they can be found in the font's codepoint-to-glyph table.

    This suggests a OpenType "feature" to do that, that defines the necessary
    character part-to-glyph ID mappings (most probably these IDs would be PUAs
    in Unicode-compatible fonts), and a standard to encode those parts with
    known semantics for reordering (so that matras will be displayed properly
    after the the first reordering level of Halant and Ra+H). With that feature,
    the original string would have the "composite" characters decomposed into
    their PUA part, and then only the necessary reordering can occur within the
    range of characters in strings that are part of the same script for which
    the renderer maintains the private agreement needed to support the PUAs
    described in the feature table.

    Without such feature, the font will just look like a collection of glyphs
    whose only a part are mapped to single codepoints, and other glyphs that are
    bound to no codepoints (such as the Tamil pre-base/post-base parts of
    matras).

    Well, it's difficult to find the docs now: the website www.opentype.org no
    longer points to the specs, but rather redirects to a single page hosted by
    Monotype Imaging. And I can't find now the necessay OpenType docs on the
    Microsoft.com/typography site, it only speaks about TrueType, and the
    TrueType specs are now focusing VOLT, and does not say a lot about OpenType
    features. When I look at the section related to Tamil, the most important
    pages are now linking to redirected 404 "blank" pages, so the specs are now
    incomplete...

    Is the OpenType "standard" dead?

    In a old version of the spec I have, OpenType fonts for Tamil had to support
    the following GSUB features to work with Uniscribe reordering engine:
    (1) the Language-based forms:
      'akhn' to substitute akhand ligatures
      'half' to substiture half-forms (pre-base forms)
    (2) the conjuncts & typographical forms:
      'pres' for pre-base substitutions
      'abvs' for above-base substitutions
      'blws' for below-base substitutions
      'psts' for post-base substitutions
    (3) the halant forms:
      'haln' for halant form substitutions
    The problem with these features is that they are tables mapping strings of
    glyph IDs, but how can we compute the glyph IDs that represent pre-base and
    post-base half-matras (or the two halves of KSSA), and that are needed
    before looking up those tables?

    Also this model of reordering is not really working with reordered glyphs;
    instead it successively creates ligatures, which will be painted as a single
    glyph, this may make difficult to perform sub-cluster selection (for example
    selecting the whole matra without the base consonnant), or to give it
    distinct color. For those things, we need distinct codes for each part, and
    we need eventually to give them contextual but still distinct/separated
    shapes that effectively need to be ordered, unlike the substitutions above
    that have no order given that they produce a single glyph ID for their input
    pairs or triplets of glyph IDs...

    A refined model would encode special features that give the effective
    semantics of glyph IDs before the GSUB features are applied (alternatively,
    the composed glyph ID generated by OpenType GSUB tables could be decomposed
    with a final ordered 1-to-N substitution feature, but this still requires to
    feed the input text with glyph IDs for half matras...).

    So all these leave me very perplex about the portability of fonts across
    systems (and I feat the Uniscribe only works reliably by detecting a few
    fonts made or accepted by Microsoft only, and whose names would be
    internally hardwired in Uniscribe). This may explain why some scripts can't
    work on all versions of Windows, and that fonts working with Uniscribe are
    severely tied to Uniscribe's implementation (or even worse, its version...).



    This archive was generated by hypermail 2.1.5 : Mon Aug 22 2005 - 17:21:09 CDT