From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Wed Jul 23 2003 - 05:40:03 EDT
On 22/07/2003 20:34, John Hudson wrote:
> At 06:00 PM 7/22/2003, Rick McGowan wrote:
>
>> A solution with CGJ has been proposed, which is very general and can be
>> applied to this and other such situations.
>
>
> I get the impression that CGJ support is not very high on the list of 
> things going to be implemented any time soon by the application 
> developers that matter to us. I'm not saying this is right, only that 
> it raises practical concerns about recommending this solution. Other 
> control characters that have been around longer may not pose this 
> problem, but may still require updates to existing Hebrew engines. I'm 
> currently trying to figure out what works and what does not in the 
> existing implementations. We're already recommending ZWNJ to inhibit 
> meteg +hataf vowel ligation, but this has problems because the control 
> character breaks the mark positioning lookups. I've yet to determine 
> whether this is a fault in the font lookups, the shaping engine, 
> particular apps or text services,
> or something fundamental to the architecture.
>
> John Hudson
>
> Tiro Typeworks        www.tiro.com
> Vancouver, BC        tiro@tiro.com
>
>
>
I hope you are not suggesting that any application developers are 
prepared to implement changes to support proposals which they have put 
forward to the UTC but are not prepared to implement changes to support 
alternative fixes to the same problems which may be preferred by the UTC 
because they are acceptable to users. Well, this would be an acceptable 
position if the alternative fix is much harder to implement than the 
preferred proposal. But in this case the alternative fix, using CGJ, 
seems to be actually a very trivial matter for a rendering engine. All 
it needs to do is to delete from its input stream any CGJ character 
before it attempts any positioning - but not before doing any 
normalisation. Of course this doesn't mean that any particular rendering 
engine can currently be programmed to do this.
In fact it seems to me that the biblical Hebrew rendering problems which 
I have heard about (on various lists and privately) could be solved 
easily by introducing a simple pre-processing pass into the rendering 
engine. (But this is not a fix to the Yerushala(y)im problem or the 
meteg ordering problem.) This pre-processing pass should sort any 
combination of base letter and following combining marks into an order 
which is efficient for the rendering engine, not necessarily the Unicode 
canonical order, for example according to the "custom combining classes" 
of 
ftp://publisher.libronix.com/drop/Tiro/SBLHebrew-Distribution/SBLHebrew-Manual.pdf. 
It should also delete characters which are not actually to be rendered 
e.g. CGJ. This pass would also satisfy the preference of Unicode 
conformance requirement C9 in 
http://www.unicode.org/book/preview/ch03.pdf: "Ideally, an 
implementation would always interpret two canonical-equivalent character 
sequences identically." As in any practical case this is a sort of no 
more than four or five combining characters according to fixed classes, 
it can be performed very quickly if programmed into the rendering engine 
at a binary level (though not necessarily if attempted in the rendering 
engine's high level language which is not designed for this), especially 
as short cuts e.g. hash tables can be used for commonly encountered 
input orderings, including the Unicode canonical ordering.
-- Peter Kirk peter.r.kirk@ntlworld.com http://web.onetel.net.uk/~peterkirk/
This archive was generated by hypermail 2.1.5 : Wed Jul 23 2003 - 06:23:30 EDT