From: Peter Kirk (peterkirk@qaya.org)
Date: Thu Oct 16 2003 - 16:29:44 CST
On 16/10/2003 12:38, Peter Constable wrote:
>>-----Original Message-----
>>From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
>>    
>>
>On
>  
>
>>Behalf Of Asmus Freytag
>>    
>>
>
>
>  
>
>>>>Canonical equivalence must be taken into account in rendering
>>>>        
>>>>
>multiple
>  
>
>>>>accents, so that any two canonically equivalent sequences display as
>>>>        
>>>>
>the
>  
>
>>same.
>>
>>This statement goes to the core of Unicode. If it is followed, it
>>guarantees that normalizing a string does not change its appearance
>>    
>>
>(and
>  
>
>>therefore it remains the 'same' string as far as the user is
>>    
>>
>concerned.)
>
>I agree in principle. There are two ways in which the philosophy behind
>this breaks down in real life, though:
>
>1. There are cases of combining marks given a class of 0, meaning that
>combinations of marks in different positions relative to the base will
>be visually indistinguishable, but the encoded representations are not
>the same, and not canonically equivalent. E.g. (taken from someone else
>on the Indic list) Devanagari ka + i + u vs. ka + u + i.
>  
>
As we are talking about rendering rather than operations on the backing 
store, this is actually irrelevant. If two sequences are visually 
indistinguishable (with the particular font in use), a rendering engine 
can safely map them together even if they are not canonically 
equivalent, as long as the backing store is unchanged.
>2. Relying on normalization, and specifically canonical ordering, to
>happen in a rendering engine IS liable to be a noticeable performance
>issue. I suggest that whoever wrote
>
>  
>
>>Rendering systems should handle any of the canonically equivalent 
>>orders of combining marks. This is not a performance issue: The amount
>>    
>>
>>of time necessary to reorder combining marks is insignificant compared
>>    
>>
>>to the time necessary to carry out other work required for rendering.
>>    
>>
>
>was not speaking from experience.
>
>  
>
I wonder if anyone involved in this is speaking from real experience. 
Peter, I don't think your old company actually tried to implement such 
reordering; Sharon tells me that the idea was suggested, but rejected 
for reasons unrelated to performance. I have heard that your new company 
has tried it and has claimed that for Hebrew the performance hit is 
unacceptable. I am still sceptical of this claim. Presumably this was 
done by adding a reordering step to an existing rendering engine. But 
was this reordering properly optimised in binary code, or was it just 
bolted on to an unsuitable architecture using a high level language 
designed for the different purpose of glyph level reordering?
Also, as I just pointed out in a separate posting, there should be no 
performance hit for unpointed modern Hebrew as there are no combining 
marks to be reordered. The relatively few users of pointed Hebrew would 
prefer to see their text rendered correctly if a little slowly rather 
than quickly but incorrectly.
If, as you agree in principle, this is an issue which goes to the core 
of Unicode, should you not be prepared to take some small performance 
hit in order to conform properly to the architecture?
> ...
>
>If what is normalized is the backing store. If what is normalized is a
>string at an intermediate stage in the rendering process, then this is
>not the case. The reason is the number of times text-rendering APIs get
>called. ...
>
If it is unavoidable to call the same routine (for sorting or any other 
purpose) multiple times with the same data, the results can be cached so 
that they do not have to be recalculated each time.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST