Re: Generic base characters

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Jul 16 2007 - 14:14:56 CDT

  • Next message: Kenneth Whistler: "Re: FW: Subj: Amount of Space Unicode Takes"

    On 7/16/2007 11:50 AM, Kent Karlsson wrote:
    >
    > Asmus Freytag wrote:
    >
    >>> 1) I'm not so sure about that. It's better to have a single defined
    >>> behaviour (assuming the characters in question are at all supported).
    >>>
    >>
    >> In cases like this, you not only have the question of which
    >> *characters*
    >> are supported, but also which *character sequences* are
    >> supported. Just
    >> like a font designed for some language other than Swedish
    >> might have a
    >> glyph for the f, and the j, but, which despite supporting an
    >> fi and fl
    >> ligature does not support an fj liagature, other parts of the layout
    >> system may legitimately not support some sequences even if
    >> they support
    >> each letter and similar sequences.
    >>
    >
    > While I would appreciate if more fonts supported the fj ligature,
    > I would expect no rendering system or font to insert a dotted
    > circle between an f followed by a j just because they don't
    > support that ligature.
    Neither would I. It would be a cause for extreme customer
    dissatisfaction ;-)
    > Instead they just output an f followed
    > by a j, though the result sometimes is not perfect (but much
    > better than getting a dotted circle in-between).
    >
    This is because this particular fallback is so eminently preferable in
    this example - not (only) because Unicode says so, but from the logic of
    how the script works.
    >
    >> This is not a conformance issue but
    >> one of quality and scope of an implementation.
    >>
    >
    > True. But, while formally conforming, it is still a bad idea
    > to start inserting dotted circles where there is none in the input.
    >
    OK. Now that we are not talking about conformance, but design choice, we
    can argue all day which one is better. I don't believe that it is
    a-priori and absolutely a bad idea, but I've made clear that I do see
    some real and some potential issues with this kind of approach depending
    on how it is realized.
    >
    >>> 2) NBSP base is for sequences of combining characters preceeded
    >>> by beginning of string or by a control char. I think using NBSP
    >>> as the implicit base in such cases is a reasonable behaviour.
    >>> (Inserting a dotted circle is not.)
    >>>
    >>>
    >> I've always understood that recommendation to be aimed a
    >> preventing the
    >> combining mark from being handled in completely weird ways, e.g. by
    >> trying to overhang it into empty space at the beginning of a
    >> line. I see
    >>
    >
    > I would say that trying to overhang it into empty space at the
    > beginning of a line is much LESS weird than getting a dotted cirlce
    > there (where none was in the text).
    >
    >
    Clearly a matter of opinion. The standard suggests that it's better to
    have some base character, and, in the absence of a higher level
    protocol, that this should be the NBSP.
    >> nothing in the standard that prevents a higher level
    >> protocol, such as Uniscribe, to override this behavior.
    >>
    >
    > Formally, no, but it is still a bad idea.
    >
    Everyone is entitled to their opinion.
    >
    >>> 3) This thread started talking about there actually being a base
    >>> present in the text just before the combining sequence, just that
    >>> the base was in another script (or some symbol/punctuation).
    >>> That is not an error case from a text rendering point of view.
    >>> There is no reason to start inserting dotted circle, NBSP,
    >>> or anything else. Ligation, kerning, postioning adjustments
    >>> are unlikely to work except for special cases, but some rough
    >>> approximate (assuming again that the individual characters at
    >>> all are supported by the rendering system and font used) should
    >>> be output.
    >>>
    >>>
    >> As I have pointed out, I regard the application of the policy
    >> to these
    >> cases as one of the 'issues', because it can lead to unintended (and
    >> limiting) results.
    >>
    >
    > Indeed.
    >
    >
    >
    Here is something we agree on.
    >> But I can understand why layout engine creators don't
    >> want to support an anything goes approach, because doing that
    >> at *high quality* is extremely expensive.
    >>
    >
    > Yes, but *high quality* for the unexpected cases was not required.
    > Just that one got a reasonable approximation; dotted-circle-less.
    >
    >
    This is a matter of choice/opinion/preference. Something between the
    users and the vendors.
    >> That said, a better way to do the
    >> fallback would be appropriate. Johns suggested list of
    >> generic bases is
    >> a good way to indicate a minimal level of support.
    >>
    >
    > I agree, for getting better-quality display, but there is still
    > no reason to insert dotted circles for other cases.
    >
    >
    See above.
    >>>> Authors should not have
    >>>> an expectation of portably exchanging buggy text with perfect
    >>>>
    >>>>
    >> A buggy text is one that has missing base characters. That's
    >> how I meant
    >> this usage in my post. If you construed that differently
    >> based on some
    >> real or perceived deficiency in how I worded that, I'm sorry.
    >>
    >
    > I did not. But that was not the case that this thread was mostly
    > concerned with.
    >
    > ...
    >
    >> But also, indicating that a renderer can't support something, *is*
    >> legitimately the business of the implementation. I think that
    >> software
    >> that uses fallbacks for diacritics and that can't rais stacked
    >> diacritics properly would be better off causing a visible
    >> clash or even
    >> spacing the combining marks than silently overstriking them.
    >> As another example.
    >>
    >
    > That would be better than inserting spurious dotted circles.
    >
    I see that you just can't abide by dotted circles ;-)

    A./
    >
    > /kent k
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Jul 16 2007 - 14:16:21 CDT