From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Aug 25 2009 - 13:52:06 CDT
Peter,
I think that this discussion shows a rather general problem
with the Mc classification.The majority of these characters
are intended to be rendered in a way that is indistinguishable
from ordinary characters (they simply follow the preceding
character).
A few years ago, there was a distinction introduced into the
standard between graphically combining characters and
combining characters by classification. You'll find the details
in the appropriate section of chapter 3.
Yet (too many) renderers continue to implement the Mc
characters using the same machinery as that for graphically
combining characters, including placing limits on the allowed
"base" characters that precede them, introducing dotted
circle in any context not previously foreseen and other
such things.
I for one am convinced that the way the Mc classification
was applied was either poorly thought out or altogether
a mistake. However, it now exists in the standard.
In principle, you can take three courses of action. One
is a modified from of the 'do nothing' - it would document
problems with isolated cases of implementation Mc characters.
The implementers of rendering systems might notice
these nuggets of information, and may make corrections
on a case-by-case basis.
The second is the radical solution: reclassify every single
character from Mc to Lo where there isn't any compelling
reason (in rendering or processing) to consider that
character actually "combining" in function, not just in name.
The advantage of this approach is that it would be very
visible and direct. Treating an "Lo" character by using the
support for graphically combining characters in a
renderer is obviously wrong, so you might expect a
pressure on *all* implementations to get that corrected.
The downside, of course, is that it's impossible to predict
what uses the gc=Mc classification has been put to by
actual implementations, outside of simple rendering issues.
You are correct in calling such an approach destabilizing,
no matter how appealing it would be, otherwise. For
the same reason, UTC is correct to continue to be
consistent with past practice in assigning Mc to any new
characters that are analogues to existing Mc characters.
The third approach would leave the actual assignments in
place, but achieves the same effect by a highly visible effort to
document the improved understanding of what it means
for a character to have classification Mc.
Unlike the first option, this would not be a case-by-case
annotation of a few problematic characters in diverse
script chapters, but would have to be more up-front.
Where ever combining marks are discussed in the
standard, the distinction between true "graphically
combining" characters and mere notional combining
marks needs to be highlighted and clear implementation
guidelines given (such as "don't use special rendering
for most Mc characters, render them like Lo characters").
A similar, high profile discussion of this belongs into
the FAQ on Indic scripts, and any other publications
likely to be consulted by people implementing fonts
and renderers.
A./
This archive was generated by hypermail 2.1.5 : Tue Aug 25 2009 - 13:54:14 CDT