L2/00-249
From: Karlsson Kent -
keka [keka@im.se]
Sent: Thursday, August 03, 2000 6:49 AM
To: Multiple Recipients of Unicore
Subject: RE: UTC Agenda item: Mathematical Letter Symbols
Regarding the "math alphanumeric
characters" proposal 
-----------------------------------------------------------------------------
 
I've finally got some time to comment on this
issue.  I've been too busy 
editing a somewhat math oriented document which
does do distinctions 
between upright non-bold, bold, and italic
versions of the same sequence 
of letters, as well as between bold and non-bold
versions of the same 
symbols (for plus, minus, and infinity, as it
happens).  It also uses 
multi-letter identifiers in math
expression.  That the identifiers are multi- 
letter is important.  The document would be
unreadable if single-letter 
identifiers had been used throughout. 
I'm very strongly opposed to the "math
alphanumeric characters" proposal. 
As someone that would be a 'user' of the
"math alphanumeric characters" if they 
were to be accepted and then used in e.g.
MathML, I very much fear the problems 
that will result: problems setting/changing
variety, problems with searches, 
problems getting the desired identifiers in the
desired variety. E.g., I might 
not be able to get a bold "oändlig",
or at least have severe problems in finding 
and using a work-around. This is not an
unrealistic example, the document 
I've been busy with has the bold identifier
"infinitary" (in math expressions!). 
If I were to translate the document to Swedish,
that would be a bold "oändlig". 
And the "math alphanumeric characters"
do not allow me to write that! 
 
Character properties
The 'math alphanumeric characters' are not
symbols any more than an 
ordinary letter is a symbol. So these
characters, if adopted (which they definitely 
should NOT be), should unequivocally be given
the general categories Lu, Ll, and 
Nd as appropriate, with compatibility
(<font>) mappings to the ordinary letters 
and digits. Notice that even the proponents of
these "math alphanumeric characters" 
seem to propose to use the ordinary letters and
digits in math expressions too 
 (though it is not entirely clear for
exactly what; upright non-bold letters and digits?). 
Notice that (Latin, Greek) letters in math
expressions are most commonly 
italic. The non-italic letters in math
expressions are much more of an exception. 
That is why (La)TeX by default makes letters in
math expressions in italic. 
 
 
 
Alleged added mark-up verbosity
The only "hard and fast" argument
for including these "math 
alphanumeric characters" appears to be to
"save some bandwidth" in that using 
mark-up instead would be more verbose. 
This is, however, 100% false. If the 
mark-up scheme is done in any reasonable way,
using mark-up instead is 
(marginally) LESS verbose than using these
"math alphanumeric characters". 
Example: 
     "math alphanumeric
characters" (in a MathML setting): 
        <mi>abc</mi>                                    
(upright non-bold???) 
        <mi>&bolda;&boldb;&boldc;</mi>   
(upright bold) 
        <mi>&fraka;&frakb;&frakc;</mi>     
(fraktur) 
        (etc.
for the less than handful of different varieties) 
    (one possible, reasonably
done) "mark-up instead" alternative: 
        <mr>abc</mr>   
(upright non-bold) 
        <mb>abc</mb>   
(upright bold) 
        <mf>abc</mf>   
(fraktur) 
        (etc.
for the less than handful of different varieties) 
Shortening the entity names or using the
"math alphanumeric characters" 
directly (in UTF-8 or UTF-16), which the
proponents apparently suggest, 
 is still more verbose than the alternative
mark-up version given here. 
There is only a handful of varieties, I'm NOT
suggesting that each and 
every font difference counts.  (I'm also
avoiding the word "style" since 
some people seem to misunderstand what that
would mean.) 
 
Bold (non-alphanumeric) symbols
If "math alphanumeric characters"
are 'needed' because of semantic distinctions 
between the few varieties, then all "math
symbols" (category Sm) also need 
to be duplicated in bold versions.  Is this
the plan?  If not, why not?  Bold 
symbols are sometimes used in a semantically
distinct way relative to the 
corresponding non-bold symbol. The reasoning for
both "math alphanumeric 
characters" and "bold math
symbols" would be the same, and should be treated 
the same way when it comes to encoding
considerations! 
Bold symbols are in LaTeX obtained via the
\boldsymbol command, or via the \pmb 
command. (\pmb is 'poor mans bold' which
simulates bold by overtyping. Handy 
if the bold symbol desired is not available (in
true bold) in the symbol font installed.) 
 
Semantic significance
The different varieties of letters in math,
like italic, bold, fraktur, does signify a 
semantic difference, so does bold vs. non-bold
versions of other (Sm) symbols. 
This does not mean that this difference need to
be mediated through 
different character allocations.  Indeed,
MathML makes a semantic 
difference between <mi> and <mn>, as
well as a host of other such 
differences.  There is no reason why
MathML, and similar mark-up schemes, 
could not make the difference between, say,
italic and fraktur a mark-up one, 
<mi> (italic), <mf> (fraktur).
 
Math is inherently "non-plain" text
Very little math can be written without
mark-up of some sort.  Also Murray's 
"plain text math" is a (very own) kind
of mark-up. 
 
Multi-letter identifiers and I18n
Some branches of math, computing science in
particular, use multi-letter identifiers 
also in mathematical expressions.  If these
are expressed in any other language 
than English, making them, e.g. bold, suddenly
needs a different mechanism for 
making them so.  It is very unlikely that
any systems will handle this gracefully 
if they are geared towards using "math
alphanumeric characters".  Likewise, 
making symbols bold will require a separate
mechanism, unless you plan to 
also allocate "bold math symbols" as
separate characters. 
 
Old TeX vs. modern LaTeX
Old TeX used commands like \calE to get a
calligraphic ('script') E.  Each available 
letter in each available 'math' variety had its
own command.  This is very similar 
to the "math alphanumeric characters"
proposal. 
However, modern LaTeX has abandoned that
approach, and instead use parametric 
commands, where the parameter is the letters
(plural!)  to be set in a particular 
variety. E.g. \mathcal{E} to get a calligraphic
('script') E.  This way multi-letter 
identifiers can gracefully be handled, and
allows in principle multi-letter identifiers 
(in  math expressions!) that need not be
derived from *English* words, but 
can be from some other language. 
LaTeX math identifier 'commands' (cmp. 'mark-up'):
  \mathit{abc}    Italic
(in principle, default for single-letter identifiers in LaTeX) 
  \mathbf{abc}   Bold identifiers
  \mathrm{abc}   Upright,
non-bold (typically: "sup", "sin", "lim", ...)
  \mathcal{abc}  
"Calligraphic"/"Script" identifiers 
  \mathsf{abc}   Sans-serif
identifiers 
  \mathtt{abc}  
"Teletype"/"monospace"/"typewriter" identifiers
  \frak{abc}   Fraktur
identifiers (amstex package) 
  \Bbb{abc}   Double-struck
(black-board bold) identifiers (amstex package) 
There is nothing *in principle* preventing "internationalised" identifiers here.
Note that LaTeX (with amstex package) also has:
  \boldsymbol{+}  Bold symbols
(incl. sequences of symbols; \boldsymbol{+\inf}) 
  \pmb{+}   Fallback for bold
symbols ('poor mans bold'; does overtyping; useful 
        if
the symbol font does not have the desired symbol(s) in "true" bold)
There is no problem to introduce similar
mark-up distinctions in MathML-ish 
schemes, for example like this (just an example
of how it could be done): 
 
<mi>abc</mi>    Italic identifiers 
  <mb>abc</mb>   Bold
identifiers 
  <mr>abc</mr>  
Upright, non-bold (typically: "sup", "sin",
"lim", ...) 
  <mc>abc</mc>  
"Calligraphic"/"Script" identifiers 
  <ms>abc</ms>  
Sans-serif identifiers 
  <mt>abc</mt>  
"Teletype"/"monospace"/"typewriter" identifiers
  <mf>abc</mf>  
Fraktur identifiers 
  <md>abc</md>  
Double-struck (black-board bold) identifiers 
  <mn>123</mn>  Upright
non-bold numerals 
  <mm>123</mm>  Bold
numerals 
  <ml>123</ml>  Italic
numerals 
  <mo>+</mo>  Non-bold
symbols 
  <mp>+</mp>  Bold symbols
There is nothing in principle preventing
"internationalised" identifiers here. 
This method does not affect Unicode in any way,
no new characters at all. 
But it does allow for 1) internationalised
multi-letter identifiers, and 2) 
bold symbols too.  And that without any
private use characters, plane 1 
characters, and no bold clones of symbols. 
It's more general and flexible too. 
If mathematics develops so that, say, italic
sans-serif were a new recognised 
variety, no new characters need be added, just a
new tag in the mark-up scheme. 
 
Existing "math alpha chars" should NOT be used
The existing "math alphanumeric"
characters (in the BMP) should NOT be used. 
In particular not with mark-up schemes that can
(and should) do the distinction 
by mark-up (like <mi>i</mi>,
<mc>R</mc>, etc.).  That the existing "math 
alphanumeric" characters (in the BMP) were
ever encoded should be regarded 
as a mistake. 
 
/Kent Karlsson