Torsten Mohrin has given to me the perfect reply
to my question, a Unicode Standard citation:
"Superscripts and subscripts have been included in the Unicode
Standard only to provide compatibility with existing character
sets. In general, the Unicode character encoding does not attempt
to describe the positioning of a character above or below the
baseline in typographical layout."
It makes sense, although it hurts me and my little asterisk.
To place you in context, my interest about the "superscript
asterisk" is addressed towards plain text mathematics.
Now, I would like to act as devil advocate, so please unmount
my following arguments.
First, let me rename the proposed symbol as REGULAR ASTERISK
instead of SUPERSCRIPT ASTERISK, in acordance to my next
point (and as a trick to reduce the psychological rejection;)
Second, and main, (observe the broken parallelism with the upper
Unicode Standard citation) when i propose REGULAR ASTERISK
i don't attempt to describe the positioning of the ASCII
ASTERISK character, but to describe a new character, "new"
since it has different application and meaning.
Remember the Unicode slogan: a gliph does not define a character;
REGULAR ASTERISK character is not necessarily the same as ASCII
ASTERISK character only because of their similarity in form.
So applying the "superscript rejection rule" perhaps is not so
direct.
See, for example, how DIGIT TWO and SUPERSCRIPT TWO are a
different case. The character DIGITAL TWO is related to a well
defined mathematical concept (a natural number), a precise
meaning for a symbol, so good justification for a character.
The SUPERSCRIPT TWO relates to the same well defined mathematical
concept, which does not vary when the gliph stands in a different
position or has a different size. So it does not merits a new
character (except for backwards compatibility).
The same applies to the rest of digits and to all the letters,
including greek-math letters. I think this observation prevents
against Kenneth Whistler's argument:
"... Otherwise there would be no end to it: for example, any
math italic variable name can be used as a superscript; likewise
any Greek letter, and so on."
Third, i want to make patent an incoherence:
ASTERISK OPERATOR was accepted as a new different character,
in spite of its *identical* form to ASCII ASTERISK, because it
denotes a different meaning.
REGULAR ASTERISK, on the contrary, cannot be accepted as a new
different character, although it denotes a different meaning,
because of its *similar* form to ASCII ASTERISK.
That is, REGULAR ASTERISK cannot be accepted as a new character,
in spite of the fact that its distinction in form respect to
ASCII ASTERISK gives, for the distinction as separated meaning-
characters, better support than identity of form does for ASTERISK
OPERATOR.
Now see this trap:
The weakness of the second point is that, as well as ASCII
ASTERISK does not have a precise meaning (since it is like a
gadget for multiple uses), so weaker will be the differentiation
with respect to REGULAR ASTERISK (and more sense to apply
the "superscript rejection rule").
The counter-argument: as well as second point weakness is greater,
so greater will be the ASTERISK OPERATOR incoherence. (As well as
ASCII ASTERISK does not have a precise meaning, so more precarious
will be the "ASTERISK OPERATOR versus ASCII ASTERISK" distinction)
REGULAR ASTERISK definition
---------------------------
To stand more precise, following is the meaning I intend
for REGULAR ASTERISK.
1) As mathematical symbol in regular expressions, denoting
"the marked (left-adjacent) symbol or expression may be
replicated an arbitrary number of times, including zero
times"
* For example, let caret (^) stand for REGULAR ASTERISK
in the following regular expression:
0^(1)
this regular expression denotes the set of expressions
{1, 01, 001, 0001, ... }
The suggested, but not definitory nor mandatory, form for
REGULAR ASTERISK is that of an asterisk in the position and
size of a superscript. A good reference for imitating is
the position and size of character PRIME.
For better separation of form and meaning, you can propose
another names without using the word "ASTERISK", e.g.
"REGULAR EXPRESSION MARK FOR ZERO OR MORE REPLICATIONS"
(perhaps too large, but you catch what i mean)
Bibliography for the above mentioned meaning of REGULAR ASTERISK:
Hopcroft & Ullman [1979]
"Introduction to Automata theory, languages, and computation"
Addison-Wesley
Open at random any page of that book, all guesses are you'll find
that little charming asterisk.
And more: the water proof for pseudo-superscripts.
-------------------------------------------------
Take a pair (base symbol, suspicious superscript),
* change the name of the suspicious for not to disturb the probe
(e.g. don't call it superscript),
* change the form of the suspicious and
* compare the relation of meanings between the two symbols,
wondering if the primitive relation of meanings can be
already supported
Example (a)
Take LETTER A and (imaginary) SUPERSCRIPT LETTER A
* name the second as CURIOUS THING
* change the form of CURIOUS THING to, say, a little white
square
* can CURIOUS THING, in its form of little white square,
mantain its relation with LETTER A?
No! The CURIOUS THING would << need a "connection in form" >>
towards his parent LETTER A to have a related personality.
Note that if we also change the form of LETTER A to be
a white square, then we can assign the same personality
to both simbols (although in bizarre forms); they would
be BIZARRE LETTER A and SUPERSCRIPT BIZARRE LETTER A,
but could mantain the binary relation "to be, both, LETTER A".
Example (b)
Take DIGIT TWO and SUPERSCRIPT TWO
* name the second as PRETTY MATTER
* change the form of PRETTY MATTER to, say, two vertival little
bars
* the two little bars already can express the mathematical concept
"natural number two", but the connection of meaning with DIGIT
TWO is lost (or at least it's very poorly expressed);
These symbols << need a connection in form >>, since whithout
that connection we feel that "there is an incoherence in using
different metaphors (forms) for the same concept"
Example (c)
Take ASCII ASTERISK and REGULAR ASTERISK
* name the second as REGULAR EXPRESSION MARK FOR ZERO OR MORE
REPLICATIONS
* change the form of "REGULAR EXPR..." to, say, a caret (^)
* then note that, in the same context, both conventional
ASCII ASTERISK and the new "REGULAR EXPR..." can be used with
the intended meanings << whithout needing a connection in
form >>, namely they have different and not related meanings.
Since the original connection was only about forms, not about
meanings, the separation of forms doesn't hurt.
Please, look above in this letter and you will see
i have just done that (utilized (*) and (^)) in the
definition (1) of REGULAR ASTERISK
Expanding the uses/sense of REGULAR ASTERISK.
---------------------------------------------
As well as PLUS character encounters uses that doesn't express
"numeric addition" but that support the spirit of "addition"
(e.g. string concatenation), we can encounter, for
REGULAR ASTERISK, uses that support the spirit of "REGULAR
EXPRESSION MARK FOR ZERO OR MORE REPLICATIONS". The following
is an example of those uses, the second sense i give for
REGULAR ASTERISK
2) As PRIME-like qualifier, denoting that "the marked (left-
adjacent) variable (variable in mathematical sense), can be
instanciated as some mathematical constant object (e.g. a list)
which has a character of multiplicity, namely multiplicity
with "zero or more replications" (e.g. a "possibly empty" list)
Note that the former sense (1) works with constants, while this
second works with variables.
Note also that the "spirit" is mantained, but there could exist
some sense where this "spirit" is relaxated (for example, the
condition "zero or more replications" could be ignored).
I think these are benign cases.
Coming back to my motivation:
-----------------------------
(Only a point now) Note that, for plain text mathematics, absence
of stylus for superscripts can be easily solved with expressions
like (N^2). But what for asterisk? (N^*)? That solution only works
with numbers and letters. Is there any substitute?
A lot of plain text mathematics can be done with Unicode symbols.
If it counts for something my humble opinion as computer engineer
biased towards mathematics, i think the Unicode Consortium
has really done a good job.
I think it would be fool to assemble ad infinitum mathematical
Unicode characters. Of course, it would be great and funny to have
a big repertoire of mathematical symbols, i don't deny that.
But, at least at this moment, i'm worried about a *minimal*
workset for plain text mathematics. Is in that direction
when i found the REGULAR ASTERISK as a necessity (unless you
convince me onto the contrary :-)
Well, that's all i wanted to say (at the moment...)
Nice to be in this list.
Ricardo Bermell-Benet <rbermell@aimplas.es>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT