Re: Superscript asterisk

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jun 30 1999 - 13:06:44 EDT


Ricardo wrote:

> But, at least at this moment, i'm worried about a *minimal*
> workset for plain text mathematics. Is in that direction
> when i found the REGULAR ASTERISK as a necessity (unless you
> convince me onto the contrary :-)

I must admit, I find it difficult to follow Ricardo's reasoning,
but I still think it comes down to a few simple points.

1. There are a number of asterisks already encoded in Unicode:

   U+002A ASTERISK
   U+066D ARABIC FIVE POINTED STAR
   U+2217 ASTERISK OPERATOR
   U+229B CIRCLED ASTERISK OPERATOR
   U+FE61 SMALL ASTERISK
   U+FF0A FULLWIDTH ASTERISK

   plus a bunch of dingbat asterisks: U+2722, ..., U+2731, ...
     etc., etc.

   U+002A is just the ordinary "ASCII" asterisk that we know and
   love, ambiguous as to function and form. More on it below.

   U+066D is a politically motivated addition for Arabic, which
   always retains its 5-pointed form, so as not to be reminiscent
   in any way of the Star of David.

   U+2217 is an operator symbol with the math property.

   U+229B is a circled version of U+2217 (comparable to the
   circled versions of other math operators in the same block).

   U+FE61 is a compatibility character for round-trip mapping to
   a Chinese standard.

   U+FF0A is a compatibility character for round-trip mapping to
   various Shift-JIS and other mixed-width CJK character sets.

   The dingbats are just dingbats.

2. The U+002A is already variable in form and function (as are most
   characters). A quick and easy font survey shows:

   Center of asterisk slightly below x-height: Courier New
   Center of asterisk at x-height: Bookman Old Style
   Center of asterisk slightly above x-height: Arial, Lucida Sans,
                                               Helvetica, Garamond
   Center of asterisk clearly above x-height: Times New Roman, Palatino
   (And the positioning may be tweaked by point-size.)

   6-pointed asterisk: Times New Roman, Garamond, Palatino
   5-pointed asterisk: Courier New, Arial, Bookman Old Style, Lucida Sans,
                       Helvetica

   And the function is obviously quite variable in plain text:
   Multiplication: 2*4=8 (this is the one addressed by U+2217, clearly)
   Exponential: 2**10 (varies with 2^10, and shown with formal superscripting
         in properly typeset mathematics)
   As part of compound assignment operator in C: x *= 3;
   Regular Expression Syntax: terminal ::= (tokena tokenb)*
   Emphasis in email: That is *not* what I meant!
   Convention for emoting in MUD's: *smiles*
   One of a set of annotation marks: note* (cf. use of dagger, double dagger,
         and superscripted numbers and letters for footnotes)
   As part of Pascal comments: (* repeat until done *)
   As part of C comments: /* Post an error message */
   As part of lines for visual separation:
   ********************************************************************
   And I am sure others can come up with other conventional usages.

   Characters are not generally separately encoded by function unless there is
   some very strong reason. The few instances of separation by function
   for mathematical operators were encoded in Unicode to assist in the
   development of mathematical text processing, so as to avoid the
   massive ambiguity associated with the ASCII characters "*" and "-"
   in particular, as well as the Latin-1 middle dot "·".

3. That said, any proposal to encode another asterisk must address the
   cost-benefit issues involved in further disunification of the
   asterisk. Besides all the compatibility forms and dingbats, there
   basically are two asterisks in Unicode already: the normal, ambiguous one
   at U+002A, and the mathematical operator at U+2217.

   If another asterisk were to be added, how would it be distinguished
   from these two that already exist? How would a user know which to
   enter for what circumstances? And how would software deal with the
   functional overlap with usage that already involves the existing
   characters?

   Disunifications have considerable costs, and can only be tolerated
   when there is a demonstrable, clear benefit to implementations in
   separating out two clearly delineated uses. There are a few well-known
   examples of what are now generally considered "overunifications" that
   may be addressed in the future: one of these is the distinction between
   a baseline ellipsis... and a centerline ellipsis ···, as used widely
   in East Asian typography. These may now be overunified in U+2026.
   The case for disunification of U+002A, on the other hand, is not
   at all clear.

   Criteria for disunification are now part of the formal Principles
   and Procedures document that guides WG2 in character encoding.
   And any proposal which suggests a disunification must address the
   issues in that document.

--Ken Whistler



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT