Re: 28th IUC paper - Tamil Unicode New

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Aug 22 2005 - 17:20:11 CDT

Next message: John Hudson: "Re: Historical Cyrillic in Unicode"

Previous message: Richard Wordingham: "Re: 28th IUC paper - Tamil Unicode New"
In reply to: Richard Wordingham: "Re: 28th IUC paper - Tamil Unicode New"
Next in thread: John Hudson: "Re: 28th IUC paper - Tamil Unicode New"
Reply: John Hudson: "Re: 28th IUC paper - Tamil Unicode New"
Reply: John Hudson: "Re: 28th IUC paper - Tamil Unicode New"
Reply: Richard Wordingham: "Re: 28th IUC paper - Tamil Unicode New"
Reply: Adam Twardoch: "Re: 28th IUC paper - Tamil Unicode New"
Reply: Antoine Leca: "Re: 28th IUC paper - Tamil Unicode New"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
> Adam Twardoch wrote:
>
>> Richard Wordingham wrote:
>>
>>> By the way, why can't font-encoded Tamil (e.g. using ASCII codes as a
>>> hack) display be handled on Windows by a GSUB table that handles the
>>> re-ordering? Or would that make it Level-2 anyway? Where can I find a
>>> definition of 'Level-2'?
>>
>> GSUB tables don't handle the reordering in Indic languages. It's the
>> responsibility of the OpenType Layout processor, e.g. Uniscribe.

How can an OpenType Layout processor correctly reorder glyphs, when all it
knows from a font is the binding of single (but whole) codepoints to glyphs,
and this does not work for characters that have composite glyphs that must
be reordered separately, and that don't have individual codepoints assigned
to each part?

To work reliably, it would mean that the fonts have to be specially marked
so that the glyphs associated to each part are assigned predictable PUA
codepoints where they can be found in the font's codepoint-to-glyph table.

This suggests a OpenType "feature" to do that, that defines the necessary
character part-to-glyph ID mappings (most probably these IDs would be PUAs
in Unicode-compatible fonts), and a standard to encode those parts with
known semantics for reordering (so that matras will be displayed properly
after the the first reordering level of Halant and Ra+H). With that feature,
the original string would have the "composite" characters decomposed into
their PUA part, and then only the necessary reordering can occur within the
range of characters in strings that are part of the same script for which
the renderer maintains the private agreement needed to support the PUAs
described in the feature table.

Without such feature, the font will just look like a collection of glyphs
whose only a part are mapped to single codepoints, and other glyphs that are
bound to no codepoints (such as the Tamil pre-base/post-base parts of
matras).

Well, it's difficult to find the docs now: the website www.opentype.org no
longer points to the specs, but rather redirects to a single page hosted by
Monotype Imaging. And I can't find now the necessay OpenType docs on the
Microsoft.com/typography site, it only speaks about TrueType, and the
TrueType specs are now focusing VOLT, and does not say a lot about OpenType
features. When I look at the section related to Tamil, the most important
pages are now linking to redirected 404 "blank" pages, so the specs are now
incomplete...

Is the OpenType "standard" dead?

In a old version of the spec I have, OpenType fonts for Tamil had to support
the following GSUB features to work with Uniscribe reordering engine:
(1) the Language-based forms:
  'akhn' to substitute akhand ligatures
  'half' to substiture half-forms (pre-base forms)
(2) the conjuncts & typographical forms:
  'pres' for pre-base substitutions
  'abvs' for above-base substitutions
  'blws' for below-base substitutions
  'psts' for post-base substitutions
(3) the halant forms:
  'haln' for halant form substitutions
The problem with these features is that they are tables mapping strings of
glyph IDs, but how can we compute the glyph IDs that represent pre-base and
post-base half-matras (or the two halves of KSSA), and that are needed
before looking up those tables?

Also this model of reordering is not really working with reordered glyphs;
instead it successively creates ligatures, which will be painted as a single
glyph, this may make difficult to perform sub-cluster selection (for example
selecting the whole matra without the base consonnant), or to give it
distinct color. For those things, we need distinct codes for each part, and
we need eventually to give them contextual but still distinct/separated
shapes that effectively need to be ordered, unlike the substitutions above
that have no order given that they produce a single glyph ID for their input
pairs or triplets of glyph IDs...

A refined model would encode special features that give the effective
semantics of glyph IDs before the GSUB features are applied (alternatively,
the composed glyph ID generated by OpenType GSUB tables could be decomposed
with a final ordered 1-to-N substitution feature, but this still requires to
feed the input text with glyph IDs for half matras...).

So all these leave me very perplex about the portability of fonts across
systems (and I feat the Uniscribe only works reliably by detecting a few
fonts made or accepted by Microsoft only, and whose names would be
internally hardwired in Uniscribe). This may explain why some scripts can't
work on all versions of Windows, and that fonts working with Uniscribe are
severely tied to Uniscribe's implementation (or even worse, its version...).

Next message: John Hudson: "Re: Historical Cyrillic in Unicode"
Previous message: Richard Wordingham: "Re: 28th IUC paper - Tamil Unicode New"
In reply to: Richard Wordingham: "Re: 28th IUC paper - Tamil Unicode New"
Next in thread: John Hudson: "Re: 28th IUC paper - Tamil Unicode New"
Reply: John Hudson: "Re: 28th IUC paper - Tamil Unicode New"
Reply: John Hudson: "Re: 28th IUC paper - Tamil Unicode New"
Reply: Richard Wordingham: "Re: 28th IUC paper - Tamil Unicode New"
Reply: Adam Twardoch: "Re: 28th IUC paper - Tamil Unicode New"
Reply: Antoine Leca: "Re: 28th IUC paper - Tamil Unicode New"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Aug 22 2005 - 17:21:09 CDT