L2/00-249
From: Karlsson Kent -
keka [keka@im.se]
Sent: Thursday, August 03, 2000 6:49 AM
To: Multiple Recipients of Unicore
Subject: RE: UTC Agenda item: Mathematical Letter Symbols
Regarding the "math alphanumeric
characters" proposal
-----------------------------------------------------------------------------
I've finally got some time to comment on this
issue. I've been too busy
editing a somewhat math oriented document which
does do distinctions
between upright non-bold, bold, and italic
versions of the same sequence
of letters, as well as between bold and non-bold
versions of the same
symbols (for plus, minus, and infinity, as it
happens). It also uses
multi-letter identifiers in math
expression. That the identifiers are multi-
letter is important. The document would be
unreadable if single-letter
identifiers had been used throughout.
I'm very strongly opposed to the "math
alphanumeric characters" proposal.
As someone that would be a 'user' of the
"math alphanumeric characters" if they
were to be accepted and then used in e.g.
MathML, I very much fear the problems
that will result: problems setting/changing
variety, problems with searches,
problems getting the desired identifiers in the
desired variety. E.g., I might
not be able to get a bold "oändlig",
or at least have severe problems in finding
and using a work-around. This is not an
unrealistic example, the document
I've been busy with has the bold identifier
"infinitary" (in math expressions!).
If I were to translate the document to Swedish,
that would be a bold "oändlig".
And the "math alphanumeric characters"
do not allow me to write that!
Character properties
The 'math alphanumeric characters' are not
symbols any more than an
ordinary letter is a symbol. So these
characters, if adopted (which they definitely
should NOT be), should unequivocally be given
the general categories Lu, Ll, and
Nd as appropriate, with compatibility
(<font>) mappings to the ordinary letters
and digits. Notice that even the proponents of
these "math alphanumeric characters"
seem to propose to use the ordinary letters and
digits in math expressions too
(though it is not entirely clear for
exactly what; upright non-bold letters and digits?).
Notice that (Latin, Greek) letters in math
expressions are most commonly
italic. The non-italic letters in math
expressions are much more of an exception.
That is why (La)TeX by default makes letters in
math expressions in italic.
Alleged added mark-up verbosity
The only "hard and fast" argument
for including these "math
alphanumeric characters" appears to be to
"save some bandwidth" in that using
mark-up instead would be more verbose.
This is, however, 100% false. If the
mark-up scheme is done in any reasonable way,
using mark-up instead is
(marginally) LESS verbose than using these
"math alphanumeric characters".
Example:
"math alphanumeric
characters" (in a MathML setting):
<mi>abc</mi>
(upright non-bold???)
<mi>&bolda;&boldb;&boldc;</mi>
(upright bold)
<mi>&fraka;&frakb;&frakc;</mi>
(fraktur)
(etc.
for the less than handful of different varieties)
(one possible, reasonably
done) "mark-up instead" alternative:
<mr>abc</mr>
(upright non-bold)
<mb>abc</mb>
(upright bold)
<mf>abc</mf>
(fraktur)
(etc.
for the less than handful of different varieties)
Shortening the entity names or using the
"math alphanumeric characters"
directly (in UTF-8 or UTF-16), which the
proponents apparently suggest,
is still more verbose than the alternative
mark-up version given here.
There is only a handful of varieties, I'm NOT
suggesting that each and
every font difference counts. (I'm also
avoiding the word "style" since
some people seem to misunderstand what that
would mean.)
Bold (non-alphanumeric) symbols
If "math alphanumeric characters"
are 'needed' because of semantic distinctions
between the few varieties, then all "math
symbols" (category Sm) also need
to be duplicated in bold versions. Is this
the plan? If not, why not? Bold
symbols are sometimes used in a semantically
distinct way relative to the
corresponding non-bold symbol. The reasoning for
both "math alphanumeric
characters" and "bold math
symbols" would be the same, and should be treated
the same way when it comes to encoding
considerations!
Bold symbols are in LaTeX obtained via the
\boldsymbol command, or via the \pmb
command. (\pmb is 'poor mans bold' which
simulates bold by overtyping. Handy
if the bold symbol desired is not available (in
true bold) in the symbol font installed.)
Semantic significance
The different varieties of letters in math,
like italic, bold, fraktur, does signify a
semantic difference, so does bold vs. non-bold
versions of other (Sm) symbols.
This does not mean that this difference need to
be mediated through
different character allocations. Indeed,
MathML makes a semantic
difference between <mi> and <mn>, as
well as a host of other such
differences. There is no reason why
MathML, and similar mark-up schemes,
could not make the difference between, say,
italic and fraktur a mark-up one,
<mi> (italic), <mf> (fraktur).
Math is inherently "non-plain" text
Very little math can be written without
mark-up of some sort. Also Murray's
"plain text math" is a (very own) kind
of mark-up.
Multi-letter identifiers and I18n
Some branches of math, computing science in
particular, use multi-letter identifiers
also in mathematical expressions. If these
are expressed in any other language
than English, making them, e.g. bold, suddenly
needs a different mechanism for
making them so. It is very unlikely that
any systems will handle this gracefully
if they are geared towards using "math
alphanumeric characters". Likewise,
making symbols bold will require a separate
mechanism, unless you plan to
also allocate "bold math symbols" as
separate characters.
Old TeX vs. modern LaTeX
Old TeX used commands like \calE to get a
calligraphic ('script') E. Each available
letter in each available 'math' variety had its
own command. This is very similar
to the "math alphanumeric characters"
proposal.
However, modern LaTeX has abandoned that
approach, and instead use parametric
commands, where the parameter is the letters
(plural!) to be set in a particular
variety. E.g. \mathcal{E} to get a calligraphic
('script') E. This way multi-letter
identifiers can gracefully be handled, and
allows in principle multi-letter identifiers
(in math expressions!) that need not be
derived from *English* words, but
can be from some other language.
LaTeX math identifier 'commands' (cmp. 'mark-up'):
\mathit{abc} Italic
(in principle, default for single-letter identifiers in LaTeX)
\mathbf{abc} Bold identifiers
\mathrm{abc} Upright,
non-bold (typically: "sup", "sin", "lim", ...)
\mathcal{abc}
"Calligraphic"/"Script" identifiers
\mathsf{abc} Sans-serif
identifiers
\mathtt{abc}
"Teletype"/"monospace"/"typewriter" identifiers
\frak{abc} Fraktur
identifiers (amstex package)
\Bbb{abc} Double-struck
(black-board bold) identifiers (amstex package)
There is nothing *in principle* preventing "internationalised" identifiers here.
Note that LaTeX (with amstex package) also has:
\boldsymbol{+} Bold symbols
(incl. sequences of symbols; \boldsymbol{+\inf})
\pmb{+} Fallback for bold
symbols ('poor mans bold'; does overtyping; useful
if
the symbol font does not have the desired symbol(s) in "true" bold)
There is no problem to introduce similar
mark-up distinctions in MathML-ish
schemes, for example like this (just an example
of how it could be done):
<mi>abc</mi> Italic identifiers
<mb>abc</mb> Bold
identifiers
<mr>abc</mr>
Upright, non-bold (typically: "sup", "sin",
"lim", ...)
<mc>abc</mc>
"Calligraphic"/"Script" identifiers
<ms>abc</ms>
Sans-serif identifiers
<mt>abc</mt>
"Teletype"/"monospace"/"typewriter" identifiers
<mf>abc</mf>
Fraktur identifiers
<md>abc</md>
Double-struck (black-board bold) identifiers
<mn>123</mn> Upright
non-bold numerals
<mm>123</mm> Bold
numerals
<ml>123</ml> Italic
numerals
<mo>+</mo> Non-bold
symbols
<mp>+</mp> Bold symbols
There is nothing in principle preventing
"internationalised" identifiers here.
This method does not affect Unicode in any way,
no new characters at all.
But it does allow for 1) internationalised
multi-letter identifiers, and 2)
bold symbols too. And that without any
private use characters, plane 1
characters, and no bold clones of symbols.
It's more general and flexible too.
If mathematics develops so that, say, italic
sans-serif were a new recognised
variety, no new characters need be added, just a
new tag in the mark-up scheme.
Existing "math alpha chars" should NOT be used
The existing "math alphanumeric"
characters (in the BMP) should NOT be used.
In particular not with mark-up schemes that can
(and should) do the distinction
by mark-up (like <mi>i</mi>,
<mc>R</mc>, etc.). That the existing "math
alphanumeric" characters (in the BMP) were
ever encoded should be regarded
as a mistake.
/Kent Karlsson