Re: Some Missing Astrological Symbols

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Nov 15 2005 - 13:12:23 CST

  • Next message: =?iso-8859-1?Q?Marc Brugui=E8res?=: "=?iso-8859-1?Q?Re: Re: Exemplar Characters?="

    On 11/14/2005 9:23 PM, Curtis Clark wrote:

    > On 2005-11-14 13:28, Kenneth Whistler wrote:
    >
    >> U+22BB XOR and U+22BC NAND bear a superficial resemblance,
    >> but are pretty clearly not the same symbols.
    >
    >
    > If I understand correctly, their semantics are rather different as
    > well. :-)
    >
    Semantics is a tricky thing. When the depicted image is identical, you
    can have the case of an alternate use of the *same* character, rather
    than the case of independent use of a *different* character.

    The classic example is the . (PERIOD/FULL STOP). It gets used as
    sentence punctuation, abbreviation mark or decimal point, as well as
    leader dots and ellipses. We have long ago decided not to disunify these
    uses (i.e. not to consider them separate characters), with exception of
    the ellipsis and leader dots.

    The other example is the right single quote / apostrophe. Again, we
    explicitly document that 2019 fulfills both functions. (Notwithstanding
    the modifier letter apostrophe).

    In some ways, with these characters, our hands were tied by legacy -
    U+002E and U+2019 are used in just that ambiguous way in legacy data,
    and absent a visual distinction, an uncomplicated conversion from legacy
    does not invalidate the displayed text.

    Even where legacy does not come into play, the fact that two symbols
    look identical, does matter. It makes it so much more difficult for
    users to pick the correct one, and it raises potential issues with
    spoofing (or simple accidental misinterpretation, where the software
    'knows' what the character is (by its code), but the user 'thinks' it's
    the other one (from context)). [Of course, a mere superficial
    resemblance is never grounds for unification - which is why Ken
    highlighted that fact in the current discussion.]

    The simpler a glyph is (particularly for symbols or punctuation), the
    less likely will it show consistent variation in *any* font. (More
    complicated shapes provide more room for artistic re-interpretation by a
    font designer). Therefore, simple shapes should be more thoroughly
    scrutinized for unification. In that sense, a symbol looking like a NAND
    or XOR symbol would be more suspect than a complicated arrangements of
    curlicues and loops.

    The rules are also different for characters with strong script
    membership - for one, keyboard and other input methods would tend to
    constrain the user to use the character that's appropriate in the
    context of that script, and not an unrelated character of same
    appearance. Second, over time, and for some fonts, the shape for the
    character may acquire deviations that are unrelated to the typical range
    of glyphs for the 'lookalike' character.

    In determining the visual appearance of symbols and punctuation, in
    particular the latter, it's very important to consider not only the
    'ink' but also the location of the ink in the cell: is it raised or
    lowered, does it have more or less, or even asymmetric space around it?
    If positioning and spacing are different, it's more likely a different
    character, unless the spacing is systematically applied in a given *use*
    of a character. Here, legacy issues come into play. In East Asian texts,
    the left angle bracket would have a large amount of white space on the
    left (to fit into a square cell). Legacy fonts built that space into the
    glyphs (although layout engines could have supplied it). Because of the
    prevailing legacy use, it made sense to disunify that angle bracket from
    the generic one (for mathematical use), even though the semantics
    (delimiter) were the same.

    A./



    This archive was generated by hypermail 2.1.5 : Tue Nov 15 2005 - 13:13:13 CST