Ricardo wrote:
> But, at least at this moment, i'm worried about a *minimal*
> workset for plain text mathematics. Is in that direction
> when i found the REGULAR ASTERISK as a necessity (unless you
> convince me onto the contrary :-)
I must admit, I find it difficult to follow Ricardo's reasoning,
but I still think it comes down to a few simple points.
1. There are a number of asterisks already encoded in Unicode:
U+002A ASTERISK
U+066D ARABIC FIVE POINTED STAR
U+2217 ASTERISK OPERATOR
U+229B CIRCLED ASTERISK OPERATOR
U+FE61 SMALL ASTERISK
U+FF0A FULLWIDTH ASTERISK
plus a bunch of dingbat asterisks: U+2722, ..., U+2731, ...
etc., etc.
U+002A is just the ordinary "ASCII" asterisk that we know and
love, ambiguous as to function and form. More on it below.
U+066D is a politically motivated addition for Arabic, which
always retains its 5-pointed form, so as not to be reminiscent
in any way of the Star of David.
U+2217 is an operator symbol with the math property.
U+229B is a circled version of U+2217 (comparable to the
circled versions of other math operators in the same block).
U+FE61 is a compatibility character for round-trip mapping to
a Chinese standard.
U+FF0A is a compatibility character for round-trip mapping to
various Shift-JIS and other mixed-width CJK character sets.
The dingbats are just dingbats.
2. The U+002A is already variable in form and function (as are most
characters). A quick and easy font survey shows:
Center of asterisk slightly below x-height: Courier New
Center of asterisk at x-height: Bookman Old Style
Center of asterisk slightly above x-height: Arial, Lucida Sans,
Helvetica, Garamond
Center of asterisk clearly above x-height: Times New Roman, Palatino
(And the positioning may be tweaked by point-size.)
6-pointed asterisk: Times New Roman, Garamond, Palatino
5-pointed asterisk: Courier New, Arial, Bookman Old Style, Lucida Sans,
Helvetica
And the function is obviously quite variable in plain text:
Multiplication: 2*4=8 (this is the one addressed by U+2217, clearly)
Exponential: 2**10 (varies with 2^10, and shown with formal superscripting
in properly typeset mathematics)
As part of compound assignment operator in C: x *= 3;
Regular Expression Syntax: terminal ::= (tokena tokenb)*
Emphasis in email: That is *not* what I meant!
Convention for emoting in MUD's: *smiles*
One of a set of annotation marks: note* (cf. use of dagger, double dagger,
and superscripted numbers and letters for footnotes)
As part of Pascal comments: (* repeat until done *)
As part of C comments: /* Post an error message */
As part of lines for visual separation:
********************************************************************
And I am sure others can come up with other conventional usages.
Characters are not generally separately encoded by function unless there is
some very strong reason. The few instances of separation by function
for mathematical operators were encoded in Unicode to assist in the
development of mathematical text processing, so as to avoid the
massive ambiguity associated with the ASCII characters "*" and "-"
in particular, as well as the Latin-1 middle dot "·".
3. That said, any proposal to encode another asterisk must address the
cost-benefit issues involved in further disunification of the
asterisk. Besides all the compatibility forms and dingbats, there
basically are two asterisks in Unicode already: the normal, ambiguous one
at U+002A, and the mathematical operator at U+2217.
If another asterisk were to be added, how would it be distinguished
from these two that already exist? How would a user know which to
enter for what circumstances? And how would software deal with the
functional overlap with usage that already involves the existing
characters?
Disunifications have considerable costs, and can only be tolerated
when there is a demonstrable, clear benefit to implementations in
separating out two clearly delineated uses. There are a few well-known
examples of what are now generally considered "overunifications" that
may be addressed in the future: one of these is the distinction between
a baseline ellipsis... and a centerline ellipsis ···, as used widely
in East Asian typography. These may now be overunified in U+2026.
The case for disunification of U+002A, on the other hand, is not
at all clear.
Criteria for disunification are now part of the formal Principles
and Procedures document that guides WG2 in character encoding.
And any proposal which suggests a disunification must address the
issues in that document.
--Ken Whistler
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT