From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Jun 19 2005 - 05:37:14 CDT
 Patrick Andries wrote:
> And this is why it should not be possible to use these techniques in
> contextualized or cursive texts with modern days fonts (or cursors
> apparently for Tamil split vowels whose colour one would want to
> change to highlight them by first selecting them which is often not
> possible)?
Doug Ewell wrote:
> No, this is why it is not a Unicode problem.
We had a partial explanation from a Freetype developer.  It seems that 
complex layout just becomes too complex if colour (and other things - 
perhaps colour alone could be handled) has to be handled as well.  (Word may 
have solved this problem, but I don't know in what common fonts/versions.  I 
can't repeat John Hudson's success, and I don't know what the critical issue 
is - Word version, font, or Uniscribe version.)  There are two aspects which 
make this a Unicode issue:
(1) It is yet another problem for which Unicode will be blamed - the problem 
goes away if you use typewriter order-based fonts, but these will generally 
not be Unicode-encoded, and the text, stripped of mark-up, will not, for 
Tamil, have Unicode semantics.
(2) There seems to be a school of thought that one should not put XML-type 
mark-up round individual combining marks.  Unfortunately, I'm not sure that 
one can reasonably hope to have the sort of mark-up available for grapheme 
clusters that will affect a specific combining character.
I admit that both aspects are peripheral to Unicode, but they are not 
totally unrelated.  The first is 'marketing', and in the second case it 
could lead to a call for a special base character that would normally be 
deleted as the text either side of the mark-up was spliced together, but 
would occasionally (e.g. at the start of a line) be treated like 
non-breaking space (U+00A0) or the dashed circle Uniscribe inserts for most 
Brahmic scripts when base characters are missing.  I hope one would not need 
such a character for each script!
This whole topic did come up in December 2003 ( 
http://www.unicode.org/mail-arch/unicode-ml/y2003-m12/0370.html
 ).  Jon Hanna suggested the use of SVG fonts, but unfortunately the link he 
suggested in http://www.unicode.org/mail-arch/unicode-ml/y2003-m12/0480.html 
on 9 December, namely,  <http://www.w3.org/TR/charmod/benoit.svg>, has 
vanished.
As an aside, I'm beginning to get confused by the 'order' terminology.  I 
use to assume that visual order was the(?) orderly order the eye would 
follow when reading, but that does not seem to be so for RTL scripts!  If I 
want the first sense, do I have to say typewriter order for RTL scripts?  Is 
'typewriter order' appropriate for the jamos of hangul?  I think I 
understand phonetic order, but there does seem to be special licence for 
out-of-order aspirates, as in Scottish, Irish and American English 'what' 
and I think the Burmese subscript 'h', in addition to the general licence 
for phonetic change and subsequent rule modification, as in English 'make'.
I suspect 'logical order' really just means, 'the order we like'.  I'd be 
interested to see a robust scheme for logical order in Thai, which 
'Cleanicode' apparently requires.  I'd also be interested to learn (from 
Peter Constable?) when Thai collation order was made independent of 
syllabification and thus amenable to computerisation.
Richard.
This archive was generated by hypermail 2.1.5 : Sun Jun 19 2005 - 05:38:44 CDT