From: Hans Aberg (haberg@math.su.se)
Date: Mon Feb 05 2007 - 11:39:53 CST
On 5 Feb 2007, at 14:40, Michael Maxwell wrote:
> But how are you going to eliminate them "in the first place"? I
> see two
> choices: either automatically, or by hand. If it can be done
> automatically in the first place, then it could be done
> automatically during parsing.
In the case of punctuation apostrophes, I am not sure it can always
be done automatically, which would motivate admitting doing it by
hand, that is adding such a separate character.
> And I suspect the chances of getting document authors
> to do it right by hand are slim, particularly since the two characters
> would look identical on the screen (at least in a WYSIWYG editor; I
> suppose you could use a character entity in a non-WYSIWYG editor).
> And
> if people mess up, then the parsing problem is even worse, because the
> parser can't know which of the two characters it should be.
There are GUI techniques for checking matching pairs, already use in
editors used for computer language editing. Typically, when a closing
pair is entered, the opening pair is brought into the window and
blinked, or something.
But if the rendering is identical, it might be difficult to catch
mistakes, which may even happen with lookalikes. For example,
swapping the letter O and the number 0, may results in a hard-to-
catch error.
Compare also with the at least two uses of a ".": sentence end
marker, and abbreviation marker. The dots are typeset identical, but
the typesetting spaces are different, signaling a semantic difference.
One can the play the game in different ways: there are different
character types, for example, input, semantic and rendering. U+0027
might perhaps be called an input character and U+2019 a rendering
character, with no semantic apostrophe character in the set. This
difference of character types is more apparent in math: U+225D is
"equal to by definition", a semantic character clearly, but U+2254 is
"colon equals", which also can be used to indicate a defined
mathematical object, is clearly a description of its rendering. In
math, the usage of symbols are in flux, so there is no good universal
resolution of the topic: it must be handled character by character.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Mon Feb 05 2007 - 11:41:36 CST