DATE: 1998-12-01

DATE: 1998-12-01

L2/98-406

DOC TYPE:	Expert contribution
TITLE:	Proposal to encode mathematical variant tags
SOURCE:	Murray Sargent III
PROJECT:
STATUS:	Proposal
ACTION ID:	FYI
DUE DATE:	--
DISTRIBUTION:	Worldwide
MEDIUM:	Paper and html
NO. OF PAGES:	4

A. Administrative
1. Title	Proposal to encode mathematical variant tags
2. Requester's name	Murray Sargent III
3. Requester type	Expert request.
4. Submission date	1998-12-01
5. Requester�s reference	Scientific and Technical Information Exchange (STIX)
6a. Completion	Complete proposal
6b. More information to be provided?	If requested

B. Technical -- General
1a. New script? Name?	No.
1b. Addition of characters to existing block? Name?	No.
2. Number of characters	16
3. Proposed category
4. Proposed level of implementation and rationale	Level 3 since math variant tags qualify the base letter they follow
5a. Character names included in proposal?	10 are defined. Recommended to reserve 6 to have a group of 16
5b. Character names in accordance with guidelines?	Yes.
5c. Character shapes reviewable?
6a. Who will provide computerized font?	None needed
6b. Font currently available?	None needed
6c. Font format?	na
7a. Are references (to other character sets, dictionaries, descriptive texts, etc.) provided?	Yes.
7b. Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached?	Not attached, but available.
8. Does the proposal address other aspects of character data processing?	No

C. Technical -- Justification
1. Contact with the user community?	Yes. Patrick Ion, Barbara Beeton, Murray Sargent III
2. Information on the user community?	Professional mathematicians, physicists, astronomers, engineers, and other scientific and technical researchers.
3a. The context of use for the proposed characters?	Used in publication of research mathematics and other hard sciences.
3b. Reference
4a. Proposed characters in current use?	Yes.
4b. Where?	Worldwide, by scientific and technical publishers.
5a. Characters should be encoded entirely in BMP?	Yes.
5b. Rationale	Accurate publication of mathematical and scientific research on the Web is impossible without a comprehensive and accurate collection of symbols including various alphabetic variants in common use. Allocation in the BMP is in accordance with the Roadmap.
6. Should characters be kept in a continuous range?	Yes
7a. Can the characters be considered a presentation form of an existing character or character sequence?	No. The math variant tags modify the base character they follow in a way that changes that character�s semantics, i.e., it�s a different character when followed by a math variant tag than it is when it isn�t followed by such a tag.
7b. Where?
7c. Reference
8a. Can any of the characters be considered to be similar (in appearance or function) to an existing character?	No
8b. Where?
8c. Reference
9a. Combining characters or use of composite sequences included?	Yes
9b. List of composite sequences and their corresponding glyph images provided?	A list is provided below, but the corresponding glyphs are well known and are omitted.
10. Characters with any special properties such as control function, etc. included?	All the characters are modifier characters, which is a kind of control nature.

D. SC2/WG2 Administrative To be completed by SC2/WG2
1. Relevant SC 2/WG 2 document numbers:
2. Status (list of meeting number and corresponding action or disposition)
3. Additional contact to user communities, liaison organizations etc.
4. Assigned category and assigned priority/time frame
Other Comments

Mathematics has need for a number of Latin and Greek alphabets that on first thought appear to be font variations of one another, e.g., normal, bold, italic and script H.� However in any given document, these characters have distinct mathematical semantics.� For example, a normal H represents a different variable from a bold H, etc.� If one drops these distinctions in plain text, one gets gibberish.� Instead of the well-known Hamiltonian formula

�� H = ∫dτ(εΕ� + μH�),

you�d get the integral equation (!)

H = ∫dτ(εE� + μH�).

Accordingly, the STIX project requests adding normal, bold, italic, script, etc., Latin and Greek alphabets.� Straight encoding would amount to many characters and would lose some useful common information, such as all variants of H might not be recognizable as H�s.� But it does allow plain text to retain the proper character semantics and it allows simple (nonrich) search methods to work.

A more useful encoding that still allows simple search algorithms to work employs �math variant tags�, which act in some ways like nonspacing combining marks.� For example, a math script H would be encoded as H<math script>.� Encountering such a combination, a rendering engine should choose some script font to render the H.� Which script font is beyond the scope of plain text.

By default, math alphabetic characters would be considered to be Roman characters (serifs, not bold, not italic).� To change this status, I propose reserving a block of 16 math variant tags with the following values defined:

0. math italic

1. math bold

2. calligraphic (script)

3. fraktur

4. open-face

5. sans-serif

6. monospace

Zero or more such tags can follow a base character.� So a math bold italic H would be encoded as H<math italic><math bold> or as H<math bold><math italic>. For the simplest �math-unaware� search algorithms to match a given string, it�s desirable to standardize on a given order, namely the one above.� But a slightly more sophisticated algorithm can encode the tags as bits and match random orders.

To allow for other cases not currently given, it�s desirable to reserve a block of 16 such math tags.

A. Administrative

B. Technical -- General

C. Technical -- Justification

D. SC2/WG2 Administrative