DATE: 1998-12-01
L2/98-406
DOC TYPE: |
Expert contribution |
TITLE: |
Proposal to encode mathematical variant tags |
SOURCE: |
Murray Sargent III |
PROJECT: |
|
STATUS: |
Proposal |
ACTION ID: |
FYI |
DUE DATE: |
-- |
DISTRIBUTION: |
Worldwide |
MEDIUM: |
Paper and html |
NO. OF PAGES: |
4 |
A. Administrative
|
|
1. Title |
Proposal to encode mathematical variant tags |
2. Requester's name |
Murray Sargent III |
3. Requester type |
Expert request. |
4. Submission date |
1998-12-01 |
5. Requester�s reference |
Scientific and Technical Information Exchange (STIX) |
6a. Completion |
Complete proposal |
6b. More information to be provided? |
If requested |
B. Technical -- General
|
|
1a. New script? Name? |
No. |
1b. Addition of characters to existing block? Name? |
No. |
2. Number of characters |
16 |
3. Proposed category |
|
4. Proposed level of implementation and rationale |
Level 3 since math variant tags qualify the base letter they follow |
5a. Character names included in proposal? |
10 are defined. Recommended to reserve 6 to have a group of 16 |
5b. Character names in accordance with guidelines? |
Yes. |
5c. Character shapes reviewable? |
|
6a. Who will provide computerized font? |
None needed |
6b. Font currently available? |
None needed |
6c. Font format? |
na |
7a. Are references (to other character sets, dictionaries, descriptive texts, etc.) provided? |
Yes. |
7b. Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached? |
Not attached, but available. |
8. Does the proposal address other aspects of character data processing? |
No |
C. Technical -- Justification
|
|
1. Contact with the user community? |
Yes. Patrick Ion, Barbara Beeton, Murray Sargent III |
2. Information on the user community? |
Professional mathematicians, physicists, astronomers, engineers, and other scientific and technical researchers. |
3a. The context of use for the proposed characters? |
Used in publication of research mathematics and other hard sciences. |
3b. Reference |
|
4a. Proposed characters in current use? |
Yes. |
4b. Where? |
Worldwide, by scientific and technical publishers. |
5a. Characters should be encoded entirely in BMP? |
Yes. |
5b. Rationale |
Accurate publication of mathematical and scientific research on the Web is impossible without a comprehensive and accurate collection of symbols including various alphabetic variants in common use. Allocation in the BMP is in accordance with the Roadmap. |
6. Should characters be kept in a continuous range? |
Yes |
7a. Can the characters be considered a presentation form of an existing character or character sequence? |
No. The math variant tags modify the base character they follow in a way that changes that character�s semantics, i.e., it�s a different character when followed by a math variant tag than it is when it isn�t followed by such a tag. |
7b. Where? |
|
7c. Reference |
|
8a. Can any of the characters be considered to be similar (in appearance or function) to an existing character? |
No |
8b. Where? |
|
8c. Reference |
|
9a. Combining characters or use of composite sequences included? |
Yes |
9b. List of composite sequences and their corresponding glyph images provided? |
A list is provided below, but the corresponding glyphs are well known and are omitted. |
10. Characters with any special properties such as control function, etc. included? |
All the characters are modifier characters, which is a kind of control nature. |
D. SC2/WG2 Administrative
To be completed by SC2/WG2 |
|
1. Relevant SC 2/WG 2 document numbers: |
|
2. Status (list of meeting number and corresponding action or disposition) |
|
3. Additional contact to user communities, liaison organizations etc. |
|
4. Assigned category and assigned priority/time frame |
|
Other Comments |
|
Mathematics has need for a number of Latin and
Greek alphabets that on first thought appear to be font variations of one
another, e.g., normal, bold, italic and script H.� However in any given document, these characters have distinct
mathematical semantics.� For example, a
normal H represents a different variable from a bold H, etc.� If one drops these distinctions in plain
text, one gets gibberish.� Instead of
the well-known Hamiltonian formula
����������� H = ∫dτ(εΕ� + μH�),
you�d get the integral equation (!)
H = ∫dτ(εE�
+ μH�).
Accordingly, the STIX project requests adding
normal, bold, italic, script, etc., Latin and Greek alphabets.� Straight encoding would amount to many
characters and would lose some useful common information, such as all variants of
H might not be recognizable as H�s.� But
it does allow plain text to retain the proper character semantics and it allows
simple (nonrich) search methods to work.
A more useful encoding that still allows simple
search algorithms to work employs �math variant tags�, which act in some ways
like nonspacing combining marks.� For
example, a math script H would be encoded as H<math script>.� Encountering such a combination, a rendering
engine should choose some script font to render the H.� Which script font is beyond the scope of
plain text.
By default, math alphabetic characters would be
considered to be Roman characters (serifs, not bold, not italic).� To change this status, I propose reserving a
block of 16 math variant tags with the following values defined:
0.
math italic
1.
math bold
2.
calligraphic
(script)
3.
fraktur
4.
open-face
5.
sans-serif
6.
monospace
Zero or more such
tags can follow a base character.� So a
math bold italic H would be encoded as H<math italic><math bold> or
as H<math bold><math italic>. For the simplest �math-unaware�
search algorithms to match a given string, it�s desirable to standardize on a
given order, namely the one above.� But
a slightly more sophisticated algorithm can encode the tags as bits and match
random orders.
To allow for other
cases not currently given, it�s desirable to reserve a block of 16 such math
tags.