From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Sep 08 2004 - 02:49:16 CDT
From: "Jony Rosenne" <rosennej@qsm.co.il>
> Peter Kirk
>> You mean, you would represent a black e with a red acute accent as
>> something like "e", ZWJ, "<red>", IBC, acute, "</red>"? That
>> looks like
>> a nightmare for all kinds of processing and a nightmare for rendering.
>
> No, it is more like <forecolor:black, combiningcolor:red> "e" "acute"
> And there is no Unicode decision against it.
And still no decision if this invisible base character will be added or not. 
It's just a public review for now, to address the first issue of rendering 
isolated non-spacing combining marks that currently don't have a spacing 
variant (I think it's a good idea as it would avoid adding most of the 
missing ones, notably for the non-generic L/G/C combining marks).
Note that your suggestion of:
   <forecolor:black, combiningcolor:red> "e" "acute"
should also work with any normalized form of the same text, i.e. with:
   <forecolor:black, combiningcolor:red> "e with acute"
where the combining mark is composed. The issue here is that this becomes 
tricky for renderers that will need to redecompose strings in normalized 
forms, before applying style.
Basically I prefer the Peter solution with:
   "e", ZWJ?, "<red>", IBC, acute, "</red>"
which is more independant of the normalization form. Then the question is 
whever the text within <red>...</red> markup should combine visually when 
rendered.
For now I see the proposed IBC (no name for it for now) only as a way to 
transform non-spacing combining marks in spacing non-combining variants, 
when they dont exist separately in Unicode (so this would not be recommanded 
for the non-spacing acute accent which already has a spacing version that 
does not require using a leading IBC.)
Technically, if an IBC character is added, a renderer will not necessarily 
render <IBC, non-spacing combining acute> the same way as <spacing 
non-combining acute accent>, even if it should better do so.
In this past sentence, the "should" means that the existing spacing 
non-combining marks are left as the standard legacy way to encode them, and 
they normally don't combine when rendered after a base letter, even if 
there's markup around them (except if this markup explicitly says that they 
should combine):
If I take the above example,
    "e", ZWJ?, "<red>", IBC, acute, "</red>"
the same rich-text should also be renderable without the markup in 
plain-text as if it was:
    "e", ZWJ?, IBC, acute
i.e. (with the "should" above) like if it was also:
    "e", ZWJ?, spacing acute
I have placed the "?" symbol after ZWJ to exhibit the fact that something 
would be necessary to allow this last text to remove the non-combining 
non-spacing behavior of the spacing acute character. Without it, the text:
    "e", spacing acute
or equivalently (with the should above):
    "e", IBC, combining acute
would not be allowed to render a combined e with an accute, and two separate 
glyphs would be rendered, and two separate character entities interpreted 
(as they are today in legacy plain-texts).
So the question remains about how to add markup on combining marks: the 
proposed IBC alone cannot solve such problems, unless there's an agreement 
that ZWJ immediately followed by IBC should be rendered as if they were not 
present (but in that case, a spacing acute becomes semantically and 
graphically distinct from <IBC, combining acute>: this is what will happen 
in any case with normalization forms due to the Unicode stability policy, as 
existing spacing marks must remain undecomposable in NFD or NFKD forms).
I also note that IBC is intended to replace the need to use a standard SPACE 
as the base character for building a spacing variant of combining marks when 
there's no standard spacing variant encoded in Unicode (this is a legacy 
hack, which causes various problems because of whitespace normalization in 
many plain-text formats or applications, or in XML and HTML, and the special 
word-breaking behavior of spaces). I don't see it as a way to deprecate the 
existing block of spacing marks.
This archive was generated by hypermail 2.1.5 : Wed Sep 08 2004 - 02:51:32 CDT