Re: RTL PUA? from Philippe Verdy on 2011-08-24 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 24 Aug 2011 08:37:42 +0200

2011/8/24 John Hudson <john_at_tiro.ca>:
> Philippe, I'll need to think about this some more and try to get a better
> grasp of what you're suggesting. But some immediate thoughts come to mind:
>
> If BiDi is to be applied to shaped glyph strings, surely that means needing
> to step backwards through the processing that arrived at those shaped glyph
> strings in order to correctly identify their relationship to underlying
> character codes, since it is the characters, not the glyphs, that have
> directional properties. There's nothing in an OT font that says e.g. GID 456
> /lam_alif.fina/ is an RTL glyph, so the directionality has to be processed
> at the character level and mapped up through the GSUB features to the
> glyphs.

No backward stepping is needed: process the text using grapheme
cluster boundaries as a minimum unit of processing: apply
normalization, try to cmap all their characters from the same font
(use fallback fonts if needed), then if this fails try to cmap their
individual character components to find a font match.

This done, each character is now mapped to a definitive font and a
putative (incompletely resolved) glyph id in that font. Note that PUAs
will be isolated at this point (they form their own grapheme cluster).
You can then check if the font provides an override for the BC
property, from the default strong LTR value.

Then independantly:
- you can process the list of glyphs one by one, trying to match all
applicable GSUB's only if they occur on the same font as the font
associated with the previous character. You can also easily select the
typographic variants of that font, for a single glyph.
- you can update the current Bidi level of the character, using the BC
property value overrides specified in the font
containing the PUA, or the normative value for non-PUA, otherwise the
default BC property value for PUA.

If finally the remaining glyph id's are no longer substitutable, you
can then apply GPOS rules (or legacy tables for base-to-base kerning)
reliably, because you also know if the BiDi level is even (LTR) or odd
(RTL). You can then consider the glyph metrics to accumulate widths in
order to detect if an automatic line-break can occur.

When a forced or automatic linebreak does occur, you can then adjust
the justification of glyph ids. Because you also know at that point
what is the directionality of all characters (including the first
glyph of the line, and if this line starts a paragraph, from which you
have determined what is the main direction of the baseline).

You can also automatically adjust the widths of kashidas (or even
automatically insert them for microjustification of glyphs, according
to the joining properties of the associated characters).

Then you can reorder the glyph ids that are in runs opposed to the
main direction of the baseline for the paragraph.

Some more refinements are needed for handling some text decorations
(such as underlines which is not necessarily continuous in all styles
and may need to avoid cutting through strokes; but this would require
some metrics from the font, associated to glyphs with descenders).

All the above can be done in parallel (i.e. character per character,
each one being handled glyph id by glyph id, as long as there are
matchable GSUB or GPOS). The memory requirement is limited to as many
glyphs that can fit in the margin of a single line;

Finally the line can be fully drawn with the reordered glyphs (you may
need to clip the kashidas to their autojustified width, to avoid them
to overlap too far away the surrounding joined characters).
Received on Wed Aug 24 2011 - 01:42:16 CDT

This archive was generated by hypermail 2.2.0 : Wed Aug 24 2011 - 01:42:17 CDT