The mechanism proposed by John to handle ZWJ/ZWNJ makes the implicit 
assumption that those characters are transformed into glyphs (via the 
usual 'cmap' mechanism) and that this is the avenue to transfer the 
intent of those characters to the shaping code in the font (i.e. some 
kind of ligature lookup). I'd like to revisit that assumption.
The ZWJ/ZWNJ characters are formatting characters. Their function is 
definitely different from the function of the "regular" characters (such 
as "A"): they are a way to control the rendering of regular characters 
around them, and to express that control in plain text. The debate so 
far shows that there is no strong objection to that mechanism by itself.
In an environment richer than plain text, there is obviously the 
possibility that this control could be expressed by other means than 
characters. In the OpenType world, and in particular in the interface 
between the layout engine and the shaping code in fonts, we have more 
than plain text, or rather plain glyphs; we also have a description of 
which features should be applied to which glyphs. So instead of having 
glyphs that stand for ZWJ/ZWNJ, can we use these features?
In fact, we already do that every day. For example, an InDesign user can 
insert the two characters x and y, and apply a ligature feature (let's 
say 'dlig') to them. It seems to me that this is just what ZWJ is about. 
So InDesign could do the following given the character sequence x ZWJ y: 
map it the glyph sequence cmap(x) cmap(y), with 'dlig' applied on those 
two glyphs. This 'dlig' application takes precedence over one via UI, 
i.e. it happens regardles of whether the user requested 'dlig' 
explicitly. The ZWJ character is simply not mapped to the glyph stream, 
since the feature application does the job of ZWJ.
We can handle ZWNJ in the same way: the sequence x ZWNJ y is transformed 
to the glyph sequence cmap(x) cmap(y), with 'dlig' not applied on those 
two glyphs. This 'dlig' non-application takes precedence over one via 
UI, i.e. 'dlig' is not applied to these two glyphs regardless of whether 
the user requested 'dlig' explicitly.
[May be a better way of thinking about the precedence stuff is to think 
entirely in markup terms:
<ligatures-on> ... x ZWNJ y ... </ligatures-on> is transformed in the 
glyph stream <dlig> ... cmap(x) </dlig> <dlig> cmap(y) ... <dlig>, i.e. 
dlig is off on the pair x y; hold your objection that a feature is 
applied to a position rather than a range for a minute.]
With this approach, we gain two things. First, not having a "formatting" 
glyph for ZWJ is IMHO a huge conceptual win, even bigger than not having 
a "formatting" character ZWJ would be. Second, what John's proposal did 
not mention (or may be I missed it) is that it's not just the ligature 
features that have to deal with this glyph, it is all the features; 
compound that by all the formatting characters, and you will start to 
understand Paul's reaction.
It's interesting to note that this approach can be applied to other 
formatting characters as well. Either their intent can be achieved by 
the layout engine alone, without help of the font, in which case there 
is no need to show anything to the code in the font; no glyph and no 
feature are consequence of those characters. Or their intent needs help 
of the font, and the OpenType way to ask for this help is to apply (or 
not) features.
All that takes care of selecting a ligature, but it does not quite take 
care of selecting cursive forms. I can see how we could define 'dlig' to 
do that (or define a 'zwj' feature that invokes the ligature lookups 
plus some single substitution lookup), but I am not sure I am happy with 
that. In fact, I am not sure I am happy with that clause in Unicode.
Eric.
[About the features applied to ranges rather than positions: think about 
it and it should be obvious 8-) It does not make sense to apply a 
ligature at a position; what makes sense is to apply a ligature on 
range. Think about 1->n substitutions; whatever lookups apply to the 
source glyph should also apply to all the replacement glyphs - ranges 
again. I even believe that this approach is compatible with the current 
OpenType spec. More details on demand.]
This archive was generated by hypermail 2.1.2 : Fri Jul 12 2002 - 19:00:23 EDT