From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Mon Jul 16 2007 - 11:18:00 CDT
Asmus Freytag wrote:
> Such a missing base character is a bug in the text.
Yes, but...
> Despite
> the recommended fallback that you describe, the policy of
> making that visible to the author by inserting a dotted
> circle is, in principle, reasonable.
1) I'm not so sure about that. It's better to have a single defined
behaviour (assuming the characters in question are at all supported).
2) NBSP base is for sequences of combining characters preceeded
by beginning of string or by a control char. I think using NBSP
as the implicit base in such cases is a reasonable behaviour.
(Inserting a dotted circle is not.)
3) This thread started talking about there actually being a base
present in the text just before the combining sequence, just that
the base was in another script (or some symbol/punctuation).
That is not an error case from a text rendering point of view.
There is no reason to start inserting dotted circle, NBSP,
or anything else. Ligation, kerning, postioning adjustments
are unlikely to work except for special cases, but some rough
approximate (assuming again that the individual characters at
all are supported by the rendering system and font used) should
be output.
> Authors should not have
> an expectation of portably exchanging buggy text with perfect
What is "buggy text" (from a rendering engine+font point of view)?
Can you give me an example in English? Writing moooose may be an
error (or maybe not), but the rendering engine+font should not care.
It may be a spelling error, but it is in no way "buggy text" at
the rendering level. Writing <DEVANAGARI LETTER MA, DEVANAGARI
VOWEL SIGN O, DEVANAGARI VOWEL SIGN O, DEVANAGARI VOWEL SIGN O>
may be an error (or maybe not), but the rendering engine+font
should not care. It may be a spelling error, but it is in no way
"buggy text" at the rendering level. There is a problem with
above/below combining characters in that proper stacking will
quickly go outside of the line or even page boundary. But that
is a problem of a different kind, and not buggy text per se.
> fidelity, so making them aware of the problem leads to more
> robust interchange.
Which problem. Indicating such things as spelling errors is not
the business of the low level text renderer.
> Now, there are several problems with this approach (depending
> on how it is implemented).
>
> If the policy leads to authors creating didactic texts that
> rely on the presence of the dotted circle, that is a problem.
>
> If the implementation prevents users from specifying some
> other reasonable base character, and insists to show a dotted
> circle nevertheless, that prevents users from creating
> reasonable texts, limiting the functionality of the
A rendering system should have no opinion on what text is
"reasonable" or not. I can understand that items that would
be 100% confusable and that *should have been* canonically
equivalent (but aren't) one of the representations result
in blurred text (in some way, like some error glyphs being used).
But that needs to be defined in the standard so that everyone
makes the same choice of which *should-have-been* equivalent
to blurr out, and which to show without blurring. But that is
not the case for say <underline, any combining visible character>,
nor for <latin letter small x, any combining visible character>.
Nor for <lao consonant, lao combining vowel, lao combining vowel,
lao combining vowel>.
However, <LAO VOWEL SIGN E, LAO VOWEL SIGN E> is 100% confusable
with LAO VOWEL SIGN EI, but they are not canonically equivalent
(as they should have been) so one needs to be blurred to avoid
spoofing possibilities. But neither of these characters are
combining!
> implementation. Particularly egregious if an implementation
> prevents the user from providing a code point for the dotted
> circle explicitly.
> > There is no notion of "invalid"/"valid" base character for
> a combining
> > character in Unicode.
> >
> But there is also no notion that an implementation has to
> support *all* sequences of characters. It is desirable to
> create implementations that don't get in the way of the
> users' needs, but in some cases, limiting the capabilities
> results in a more stable, more easily tested implementation
> that can deliver the *intended* support more correctly and at
> times also more cheaply.
I'm not sure exactly what this refers to. But I understand that
not everything can be tested. Does spurious dotted circles mean
"we did not test this"?
/kent k
This archive was generated by hypermail 2.1.5 : Mon Jul 16 2007 - 11:20:03 CDT