From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Jun 19 2005 - 05:37:14 CDT
Patrick Andries wrote:
> And this is why it should not be possible to use these techniques in
> contextualized or cursive texts with modern days fonts (or cursors
> apparently for Tamil split vowels whose colour one would want to
> change to highlight them by first selecting them which is often not
> possible)?
Doug Ewell wrote:
> No, this is why it is not a Unicode problem.
We had a partial explanation from a Freetype developer. It seems that
complex layout just becomes too complex if colour (and other things -
perhaps colour alone could be handled) has to be handled as well. (Word may
have solved this problem, but I don't know in what common fonts/versions. I
can't repeat John Hudson's success, and I don't know what the critical issue
is - Word version, font, or Uniscribe version.) There are two aspects which
make this a Unicode issue:
(1) It is yet another problem for which Unicode will be blamed - the problem
goes away if you use typewriter order-based fonts, but these will generally
not be Unicode-encoded, and the text, stripped of mark-up, will not, for
Tamil, have Unicode semantics.
(2) There seems to be a school of thought that one should not put XML-type
mark-up round individual combining marks. Unfortunately, I'm not sure that
one can reasonably hope to have the sort of mark-up available for grapheme
clusters that will affect a specific combining character.
I admit that both aspects are peripheral to Unicode, but they are not
totally unrelated. The first is 'marketing', and in the second case it
could lead to a call for a special base character that would normally be
deleted as the text either side of the mark-up was spliced together, but
would occasionally (e.g. at the start of a line) be treated like
non-breaking space (U+00A0) or the dashed circle Uniscribe inserts for most
Brahmic scripts when base characters are missing. I hope one would not need
such a character for each script!
This whole topic did come up in December 2003 (
http://www.unicode.org/mail-arch/unicode-ml/y2003-m12/0370.html
). Jon Hanna suggested the use of SVG fonts, but unfortunately the link he
suggested in http://www.unicode.org/mail-arch/unicode-ml/y2003-m12/0480.html
on 9 December, namely, <http://www.w3.org/TR/charmod/benoit.svg>, has
vanished.
As an aside, I'm beginning to get confused by the 'order' terminology. I
use to assume that visual order was the(?) orderly order the eye would
follow when reading, but that does not seem to be so for RTL scripts! If I
want the first sense, do I have to say typewriter order for RTL scripts? Is
'typewriter order' appropriate for the jamos of hangul? I think I
understand phonetic order, but there does seem to be special licence for
out-of-order aspirates, as in Scottish, Irish and American English 'what'
and I think the Burmese subscript 'h', in addition to the general licence
for phonetic change and subsequent rule modification, as in English 'make'.
I suspect 'logical order' really just means, 'the order we like'. I'd be
interested to see a robust scheme for logical order in Thai, which
'Cleanicode' apparently requires. I'd also be interested to learn (from
Peter Constable?) when Thai collation order was made independent of
syllabification and thus amenable to computerisation.
Richard.
This archive was generated by hypermail 2.1.5 : Sun Jun 19 2005 - 05:38:44 CDT