Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri May 30 2003 - 19:47:14 EDT

  • Next message: Kenneth Whistler: "Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?"

    From: "Kenneth Whistler" <kenw@sybase.com>
    > That last fact should be taken as a hint that for most
    > purposes, manual leaders should just be sequences of FULL STOP
    > characters (as you will see, for instance in the plain text
    > representations of Internet Drafts or RFCs, for example).
    > But in any rich text format, leaders are styled formatting objects
    > (somewhat similar to tabulations, as Philippe suggested), but
    > that does *not* make U+2024 a format character (LEADER
    > PLACEHOLDER, or whatever). It is exactly what it claims to
    > be: a compatibility character, punctuation, with a single
    > baseline dot as its glyph.

    What surprizes me the most in the Unicode spec is that it both says that its purpose is to create arbitrary length of leaders (you say that the spacing statement in the Xerox name was not considered important by Xerox, so how many leaders would be needed to fit a en space with the Unicode designation?). Why then do you insist that it represents one dot ? You also seem to insist o the "compatibility" decomposition which is normally removing an important semantic (else it would be canonical).
    All this seems like creating contradictions.

    Also it would be the only punctuation sign whose number of occurences is not relevant (in dotted lines used as leaders), as the final presentation of the text will need to compensate for font metrics differences in order to produce the correct effect (also because the size of the dots where removed from the Unicode designation.)

    I do no agree wih your argument that says that it is like a full dot to be used in limited applications (if Unicode wanted to remove the spacing, it was to generalize is use as an abstract character, not to reenforce its mapping to an approximate full dot.)

    Compatibility decompositions are not intended to represent exactly the same semantics between the "composed" character and the core base characters in the decompositions. I think that compatibility decompositions are only acceptable fallbacks when the initial character is not supported, but they do not represent the same abstract characters. At least it was true before the decomposition stability "pact", but it is less clear now as roundtrip convertibility with some encodings is favored face to exact character abstraction.

    I never heard about the Xerox CCS before, but there's a large legacy usage of the ellipsis as a single unbreakable character (and the two dots for the notation of interval bounds are also unbreakable). The single dot leader looks like a way to fill the gap, only because two-dot three-dots ellipsis did not allow, in most fonts and applications, to create a regular leader, using smaller dots than the one used for the regular full stop punctuation.

    The fact that it was unified with XCCS (with some compromizes accepted by Xerox) clearly demonstrates that the Xerox design was not the main focus:
    - Who knows XCCS and use it ? Very few people.
    - Who uses leaders ? Every publisher and author of long documents that do not want to see irregularily spaced leaders, or a dotted grid instead of a true dotted horizontal line.

    Leaders are visual helpers for the eye of readers, they have absolutely no punctuation or symbolic semantic (unlike the two-dots symbol or the ellipsis). The fact that it was categorized as a punctuation is probably an initial error that can' be corrected and that comes from the classification of its approximative fallback "compatibility decomposition".

    I do not see it as a compatility character needed for roundtrip conversions with legacy sets (even if XCSS was mapped this way after some compromizes). Pure roundtrip conversions respect the initial design of the legacy set from which a character is mapped.

    So you seem to mix the very distinct concept of compatibility characters and compatibility decompositions:

    - compatibility characters are for the initial mapping from an important legacy encoding with full roundtrip, and the exact semantic is preserved in this mapping to Unicode. The usage of these Unicode codepoints is discouraged out of this legacy usage.

    - characters that have compatiblity decompositions are intended as guides for acceptable fallback characters that will not create too confusive interpretation by readers, but the exact semantic is not preserved with their compatibility decomposition. Their usage is not discouraged but instead favored by Unicode which adds important semantics in the "composed" character.



    This archive was generated by hypermail 2.1.5 : Fri May 30 2003 - 20:28:53 EDT