From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Sep 10 2004 - 18:59:42 CDT
From: "Asmus Freytag" <asmusf@ix.netcom.com>
> On the other hand, all aspects to *coloring* of characters
> do not belong in the plain text stream - but that was not
> the question.
>
> I think suggested solutions that define markup that apply to
> combining characters but place that markup outside of the
> combining sequence would be a better answer than protocols
> trying to put markup inside the combining character sequence.
>
> My personal take is that the UTC might make a recommendation
> to that effect, but it's not part of the standard proper.
> It's not clear that the issue has practical urgency - if
> I should be mistaken on that, I'd like to find out how and why.
Placing markup out of the combining sequence seems attractive, apparently, 
but exposes to other difficulties about how to refer to parts of combining 
sequences (I did not say "parts of characters", because I agree that 
combining characters are not part of characters, but effectively true 
abstract characters per the Unicode definition), when combining sequences 
are themselves subject to transformations like normalization.
A solution would be to specify in the markup which normalization to apply to 
the combining sequence before refering to its component characters, with 
some syntax like:
    <font style="color:red nfd(2,1);">e&combining-acute;</font>
which would resist to normalization of the document such as NFC in:
    <font style="color:red nfd(2,1);">&e-with-acute;</font>
Here some syntax in the markup style indicates an explicit NFD normalization 
to apply to the plain-text fragment encoded in the text element, before 
specifying a range of characters to which the style applies (Here it says 
that color:red applies to only 1 character starting at the second one in the 
surrounded text fragment, after it has been forced to NFD normalization.
May be this seems tricky, but other simplified solutions may be implemented 
in a style language, such as providing more basic restrictions using new 
markup attributes:
    <font style="combining-color:red">&e-with-acute;</font>
where the new "combining-color" attribute implies such prenormalization and 
automatic selection of character ranges to which to apply coloring. May be 
there are better solutions, that will not imply augmenting the style 
language schema with lots of new attribute names, such as in:
    <font style="color:combining(red)">&e-with-acute;</font>
Here also, Unicode itself is not affected. But markup languages and 
renderers are seriously modified to take new markup property names or values 
into account. 
This archive was generated by hypermail 2.1.5 : Fri Sep 10 2004 - 19:00:40 CDT