From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Nov 15 2005 - 13:12:23 CST
On 11/14/2005 9:23 PM, Curtis Clark wrote:
> On 2005-11-14 13:28, Kenneth Whistler wrote:
>
>> U+22BB XOR and U+22BC NAND bear a superficial resemblance,
>> but are pretty clearly not the same symbols.
>
>
> If I understand correctly, their semantics are rather different as
> well. :-)
>
Semantics is a tricky thing. When the depicted image is identical, you
can have the case of an alternate use of the *same* character, rather
than the case of independent use of a *different* character.
The classic example is the . (PERIOD/FULL STOP). It gets used as
sentence punctuation, abbreviation mark or decimal point, as well as
leader dots and ellipses. We have long ago decided not to disunify these
uses (i.e. not to consider them separate characters), with exception of
the ellipsis and leader dots.
The other example is the right single quote / apostrophe. Again, we
explicitly document that 2019 fulfills both functions. (Notwithstanding
the modifier letter apostrophe).
In some ways, with these characters, our hands were tied by legacy -
U+002E and U+2019 are used in just that ambiguous way in legacy data,
and absent a visual distinction, an uncomplicated conversion from legacy
does not invalidate the displayed text.
Even where legacy does not come into play, the fact that two symbols
look identical, does matter. It makes it so much more difficult for
users to pick the correct one, and it raises potential issues with
spoofing (or simple accidental misinterpretation, where the software
'knows' what the character is (by its code), but the user 'thinks' it's
the other one (from context)). [Of course, a mere superficial
resemblance is never grounds for unification - which is why Ken
highlighted that fact in the current discussion.]
The simpler a glyph is (particularly for symbols or punctuation), the
less likely will it show consistent variation in *any* font. (More
complicated shapes provide more room for artistic re-interpretation by a
font designer). Therefore, simple shapes should be more thoroughly
scrutinized for unification. In that sense, a symbol looking like a NAND
or XOR symbol would be more suspect than a complicated arrangements of
curlicues and loops.
The rules are also different for characters with strong script
membership - for one, keyboard and other input methods would tend to
constrain the user to use the character that's appropriate in the
context of that script, and not an unrelated character of same
appearance. Second, over time, and for some fonts, the shape for the
character may acquire deviations that are unrelated to the typical range
of glyphs for the 'lookalike' character.
In determining the visual appearance of symbols and punctuation, in
particular the latter, it's very important to consider not only the
'ink' but also the location of the ink in the cell: is it raised or
lowered, does it have more or less, or even asymmetric space around it?
If positioning and spacing are different, it's more likely a different
character, unless the spacing is systematically applied in a given *use*
of a character. Here, legacy issues come into play. In East Asian texts,
the left angle bracket would have a large amount of white space on the
left (to fit into a square cell). Legacy fonts built that space into the
glyphs (although layout engines could have supplied it). Because of the
prevailing legacy use, it made sense to disunify that angle bracket from
the generic one (for mathematical use), even though the semantics
(delimiter) were the same.
A./
This archive was generated by hypermail 2.1.5 : Tue Nov 15 2005 - 13:13:13 CST