Fwd: PDUTR #25: Unicode Support for Mathematics

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Thu Jan 03 2002 - 22:43:10 EST


FYI, Markus Kuhn sent the following comments to the authors of PUDTR#25.
They are of sufficient general interest to warrant discussion on
the list.

A./

PS: I've refreshed the HTML so there should be fewer problems for people
to read the equations in section 5 on Win2K or XP.

>X-Mailer: exmh version 2.3+CL 01/14/2001 with nmh-1.0.4
>To: Barbara Beeton <bnb@ams.org>, Asmus Freytag <asmus@unicode.org>,
> Murray Sargent III <murrays@microsoft.com>
>cc: linux-utf8@nl.linux.org (linux-utf8)
>Subject: PDUTR #25: Unicode Support for Mathematics
>X-URL: http://www.cl.cam.ac.uk/~mgk25/
>Date: Thu, 03 Jan 2002 17:11:50 +0000
>From: Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>
>
>Dear Unicode Maths team,
>
>I've read with enthusiasm your draft document
>
> http://www.unicode.org/unicode/reports/tr25/
>
>and have great hopes that this project for "Unicode Plain Text Encoding
>of Mathematics" will progress well and be widely implemented once it is
>finished!
>
>I thought (from comp.text.sgml discussions in the early 1990s) that it
>was in general widely accepted that SGML is in practice far too
>inconvenient for entering mathematical text and that and math DTD will
>not lead naturally to intuitive and consistent keyborad entry
>techniques, which is why I always considered MathML more an academic
>exercise than anything that I would ever really want to use to get work
>done. MathML has never been anywhere near being a potential competitor
>for TeX.
>
>I therefore observe with great interest that Unicode plans to treat
>mathematics as just yet another complex script (like Indic, etc.), in a
>way such that finally authors of SGML/XML document type definitions and
>style sheets will not have to make much further provisions for support
>of mathematics than for example define a single element for marking a
>displayed equation. Also the prospect of being able to search for
>mathematical formula fragments with web search engines is exciting.
>
>A few comments on the current draft:
>
> - It is not yet clear, how white-space is to be handled. In TeX,
> the math mode has a lot of heuristics for adding white space where
> mathematical typographic tradition finds it convenient, for example
> around every operator. It has often been observed that scientific papers
> written in Word have often far inferiour mathematical spacing than
> papers written in TeX, because TeX's heuristic algorithms are
> far better than an inexperienced author. However, these heuristics
> fail frequencly, and more often then desireable, TeX users have to
> manually override the math spacing with \, and the like.
>
> Your current text does not yet make it clear, whether the additional
> white space used around mathematical operators will be added by the
> rendering engine and font (as in TeX) or will be encoded in the plain
> text. I suspect encoding the whitespace in the plaintext is ultimately
> preferable, as it will ensure more control in a portable way, even
> though that means that typographic beginners will be more likely
> to produce ugly formulas. Heuristc's like TeX's would have to become
> part of the keyboard entry and style checking mechanisms of the
> editor (like the Word spell checker), not of the rendering engine.
> This should make results hopefully more predictable across a wide
> range of rendering engines.
>
> - On section 5.1 "Recognizing Mathematical Expressions":
> With intra-formula white-space being encoded in the plain text, and
> variables typically being written in the Plane 1 math characters, there
> should never be a need to explicitly delimit mathematical formulas
> from "normal text", as for the rendering engine, they would just be
> normal text. In other words, it would be desireable if your proposal
> wouldn't make having section 5.1 necessary.
>
> - What is missing at the moment are a mechanism for handling matrices
> commutative diagrams and similar tabular arrangements of inline
> formulas. Most markup languages and rendering engines have already
> very sophisticated mechanisms for the layout of tables. I think,
> the best appraoch would be to simply use or slightly extend the
> already available table mechanism to encode matrices. All that Unicode
> has to add is a combining modifier corresponding to TeX's \left and
> \right command that instructs a delimiter glyph to grow with the
> height of the text in between, which could include an inline table with
> centered alignment. Don't dublicate what the existing table engines
> already provide. In that light, I would reconsider the need for the
> briefly mentioned align-over operator.
>
> Using the table mechanism of the higher markup language has numerous
> advantages:
>
> - the DTD keeps control over where matrices are allowed (e.g., only in
> displayed equations, but not inline and not in headings or
> keyword lists)
>
> - layout and cut&paste selection in tables is a very complex process,
> you really don't want to have to implement that twice
>
> It is true that plaintext Unicode matrixes would simplify the
> cut&paste of matrices as well, but that is probably not worth the cost
> of blurring the currently quite clear interface between a paragraph
> redering engine and a page/table layout engine. Dramatically simplified
> versions of MathML on to of plaintext Unicode math can still be used
> to encode matrices in a portable and reusable way.
>
> - A stylistic comment: I think it would suit the text better not
> too spend such a lot of time with critizising TeX and MathML.
> Knowledgeable readers will be well familiar with TeX and will
> discover for themselves the advantages of your approach over
> existing practice, and the inadequacies of MathML are obvious to
> anyone who had even a brief look at the entire idea of encoding
> formulas in XML.
>
>The proposal is certainly still in an early stage, but it is heading in
>the right direction and I will follow its progress with great interest!
>
>Markus
>
>--
>Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
>Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Fri Jan 04 2002 - 02:51:06 EST