Re: RTL PUA?

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 25 Aug 2011 09:05:55 +0200

2011/8/25 Peter Constable <petercon_at_microsoft.com>:
> From: unicode-bounce_at_unicode.org [mailto:unicode-bounce_at_unicode.org] On Behalf Of Philippe Verdy
>
>> But I suspect that the strong opposition given by Peter Constable...
>
> Yet again, I think you're putting words in my mouth. The only thing I think I've explicitly spoken against in this thread is changing the default bidi category of PUA characters to ON.

Something that will break all existing implementations, but will not
solve the problem, it will just reduce the number of Bidi controls
needed in texts: BC=ON only means means that the resolved direction of
PUA characters will come from the resolved direction of previous
(non-PUA) characters. It does not work at the beginning of paragraphs.
The actual direction properties should be overridable to be another
*strong* RTL direction than the default, instead of changing it to be
extremely weak and contextual.

>> In fact when Peter says that the Bidi processing and the OpenType layout
>> engine are in separate layers (so that the OpenType layout works in a lower
>> layer and all BiDi processing is done before any font details are inspected),
>> I think that this is a perfect lie:
>
> The Unicode Bidi Algorithm uses _character_ properties and operates on _characters_. OpenType Layout tables deal only with glyphs.

You're repeating again what I also know and used in my arguments. I
have never stated that the Bidi algorithm operates at the glyph level,
I have clearly said the opposite. You are only searching a
contradiction which does not even appear.

>> At least the Uniscribe layout already has to inspect the content of any OpenType
>> font, at least to process its "cmap" and implement the font fallback mechanism,
>> just to see which font will match the characters in the input string to render.
>
>> If it can do that, it can also inspect later a table in the selected font to see which
>> PUAs are RTL or LTR. And it can do that as a source of information for BiDi ...
>
> In theory, that could be done. A huge problem with your suggestion, though, is that the bidi algorithm deals only with characters and makes no references whatsoever to font data, and for that reason -- I would hazard to guess -- most implementations of the Unicode bidi algorithm do not rely in any way on font data and would need significant re-engineering to do so.

You repeat again your argument that I have not contradicted. but this
has nothing to do with what I want to express. And any way a
reengineering will be needed in all the proposed solutions (except if
we have to encode the Bidi controls around those PUAs, something that
we really want to avoid as often as we avoid them for non-PUA
characters).

The Bidi algorithm is not changed in any way, it still uses the
character properties, except that the source of the property values
for PUA should be overridable (not only from the standard UCD, for PUA
characters), as already permitted in the Unicode standard which just
assigns them *default* property values.

If a Bidi algorithm implementation does not allow such overrides, it
is already broken and has to be fixed, because it was insufficiently
engineered. The fact that it cannot process font data at the step
specified in OpenType specifications is a defect of this
specification, which is incomplete. But even if you don't want to add
such data table in fonts, the external data will have to come from
somewere else. Otherwise only the default property values will be
used.
Received on Thu Aug 25 2011 - 02:09:37 CDT

This archive was generated by hypermail 2.2.0 : Thu Aug 25 2011 - 02:09:41 CDT