Re: RTL PUA? from Philippe Verdy on 2011-08-23 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 24 Aug 2011 04:21:47 +0200

2011/8/24 John Hudson <john_at_tiro.ca>:
> Philippe Verdy wrote:
>
>> Rereading closely the OpenType spec...
>
> I suggest you read also the script-specific OT layout specifications.
>
> http://www.microsoft.com/typography/SpecificationsOverview.mspx
>
> You'll note, for example, that the Arabic font spec doesn't even mention
> BiDi, because it is assumed that this has been resolved before glyph runs
> for OTL processing are even identified. This makes sense to me because BiDi
> is a character-centric operation.
>
> The Microsoft font specs describe what Uniscribe (and DWrite) do with text
> and fonts for particular scripts, and there may be some differences in other
> implementations. For example, Uniscribe performs s invalid mark sequence
> checks that others, preferring to see this as a task for spellcheckers, do
> not. But the glyph selection and positioning results should be the same
> across implementations. Font makers need to know how text is processed and
> OTL features applied in order to make fonts that work with resulting glyph
> runs and input strings. Changing the point in the glyph string resolution
> when BiDi is applied breaks everything. It's a complete non-starter.

I had already read this subspecs. And I think you're wrong, because
the list of glyphs is in resolved order, even after all ligature
substitution, glyph breaking (for Indic scripts) has a completely
independant order from the logical reading of characters.

You can perfectly run the BiDi algorithm after the glyph
substitutions. All what the Bidi algorithm is to delimit runs of
characters that are to be rendered in one direction or the other. The
same limits will also be boundaries across the associated runs of
glyph ids.

There's in fact absolutely no need of the Bidi algorithm to process
all glyph substitutions, because they will be performed exactly the
same way. The two algorithms are in fact completely independant of
each other, at least if you don't need to apply substitutions that
span distinct runs.

However there's a dependancy between the BiDi algorithm and the glyph
positioning, because each RTL or LTR run needs to have its own
left-side bearing, and its own right side bearing, in order to
mutually space these runs correctly. IT also influences the direction
by which you'll advance the coordinates along the baseline for
positioning the fully resolved glyph ids. This requires then to know
the principal direction of each run of glyph ids.

In fact you have absolutely not demonstrated anything that this
concept would even break anything, except ligatures between RTL and
LTR characters, i.e. between resolved RTL and LTR glyphs, something
that can only occur over the a boundary between a resolved RTL run of
glyph ids, and a resolved LTR run run of glyphs ids. But I was said
that OpenType layout does not support such thing, or that this
possible behavior is for now undocumented in OpenType specs, but this
is not the case of AAT layout and Graphite layout, but I admit that
this would cause problems on how to position such ligature glyphs that
would have an ambiguous direction, because it would then belong to two
successive directional runs at the character level).

As the above paragraph may not be very clear to understand, let's
suppose that you wanted to create a GSUB ligature between ARABIC LAM
(resolved to RTL at the character level) and LATIN CAPITAL LETTER A
(resolved to LTR at the character level, in the Bidi algorithm). You
would cmap this ligature to a "LAM_A" glyph id. Technically, nothing
in OpenType GSUB's forbids you do to that in your font. But the
OpenType engine that needs to maintain an equivalence of boundaries
between runs of characters (from Bidi) and runs of glyph ids (from the
cmap, then after GSUB substitutions) will not know if the LAM_A glyph
belongs to the first run (terminated by the RTL character LAM) or the
second run (starting by the LTR character A) without providing *with
each* GSUB rule an indication of where to place the new direction
boundary if there was a direction boundary in the middle of the source
list of glyphs, before its substitution.

Yes this is a very borderline case, because I have never seen it or
needed it in practice. Unicode prefers reencoding a new similar
character with the opposite strong direction (for example the HEBREW
ALEF SYMBOL for maths, which is very similar to the Hebrew letter but
has a opposite direction ; but here I wonder how it would create a
ligature with another strong LTR character that is also not a
diacritic, even if there's an evidence that such pair can be
GPOS'itionned, i.e. kerned).

What is only assumed is that GSUB will preserve the boundaries between
runs of characters that are in the same direction; but of course it
does not always preserve the boundaries between the logical character
clusters. This may explain your concern that this could potentially
break something, but only if you don't care about preserving
unambiguously the boundaries between directional runs, and you have no
data hint in the subtitution rules about where the reposition the
boundary after the substitution occured.
Received on Tue Aug 23 2011 - 21:27:40 CDT

This archive was generated by hypermail 2.2.0 : Tue Aug 23 2011 - 21:27:42 CDT