Terminal Graphics Proposal

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Wed Sep 30 1998 - 21:24:12 EDT


Sorry for the length of the following; if you're not interested,
skip it. The intention is to bring likeminded parties out of the woodwork;
if you are one, please contact me and we can continue the topic offline.

- Frank

TERMINAL GRAPHICS FOR UNICODE

  Frank da Cruz
  The Kermit Project
  Columbia University
  New York City
  http://www.columbia.edu/kermit/

  D R A F T # 1

  Wed Sep 30 21:15:31 1998

THIS IS A PREFORMATTED PLAIN-TEXT ASCII DOCUMENT. IT IS DESIGNED TO BE
VIEWED AS-IS IN A FIXED-PITCH FONT. ITS WIDEST LINE IS 79 COLUMNS. IT
CONTAINS NO TABS. IF IT LOOKS MESSY TO YOU, PLEASE FEEL FREE TO PICK UP
A CLEAN COPY AT:

  ftp://kermit.columbia.edu/kermit/charsets/ucsterminal.txt

PLEASE SEND COMMENTS AND SUGGESTIONS TO THE AUTHOR AT:

  fdc@columbia.edu

ABSTRACT

A selection of terminal graphics characters is proposed for Unicode [24]
and ISO 10646 [19] to allow Unicode-based terminal emulation software to
(a) display glyphs that are found on popular types of terminals but
currently are not available in Unicode, and (b) interoperate with other
Unicode applications.

CONTENTS

    1. Introduction
    2. Scope
    3. Organization
    4. Graphic Representation of Control Characters
    5. Hex Bytes
    6. Math Symbols
    7. Line and Box Drawing Characters
    8. Miscellaneous Single-Cell Glyphs
    9. Unfinished Business
   10. Summary of Proposed Additional Characters
   11. References

Tables:

  4.1. IBM PC Code Page 437 C0 Graphics
  4.2. C0 Control Pictures
  4.3. C1 Control Pictures
  4.4. 3270 Control Pictures
  4.5. EBCDIC Control Pictures
  4.6. Additional Control-Like Pictures
  6.1. Supplementary Math Symbols
  7.1. Additional Line, Box, and Block Characters
  8.1. Miscellaneous Single-Cell Terminal Glyphs
 10.1. Census of New Characters

Figures:

  4.1. Control Picture Display
  5.1. Hex Byte Pictures
  7.1. "Framus" Glyphs

1. INTRODUCTION

Terminal-host communication was the dominant form of interaction between
human and computer from about 1974 (when CRTs became affordable) to about
1994 (when the Web and Windows took over the mass market). Terminal-host
communication is still widespread, especially in large organizations, and is
expected to remain so for decades to come, playing an important part in
organizations like universities, hospitals, and government agencies, as well
as corporations, with central computing facilities, for use in applications
ranging from sofware development and system/network administration, to email
and text-based Web access, to data entry and inquiry, to transaction
processing.

A terminal, for purposes of this document, is a device for entry and display
of text in a fixed-pitch font on a screen (or on paper) in which characters
are displayed in rows and columns of fixed size "cells". Terminals
generally display the characters of ASCII [1] or EBCDIC [13], and sometimes
also accented or non-Roman letters (or ideograms), and often also "graphic"
(non-alphabetic, non-digit, non-punctuation) characters for purposes of
line- and box-drawing, mathematics, or other special effects.

In recent years, physical terminals have largely disappeared from the scene,
their functions subsumed into PCs running terminal-emulation software
alongside other applications. Unicode has effectively met the need for
encoding the earth's writing systems, but it is not well suited to terminal
emulation since it lacks some of the required graphics characters.

Without a standard encoding for the missing glyphs, each maker of terminal
emulation software must create or contract for custom fonts with private
encodings. Such fonts are not compatible with other (otherwise compatible)
fonts on the same platform (e.g. when copying and pasting between
applications), nor with each other. Furthermore, should Unicode printers
become standard equipment on PCs, terminal graphics characters will not
print correctly on them.

This document proposes a modest repertoire of terminal graphics characters
to be added to Unicode and ISO 10646, with specific encoding to be decided
by the UTC or other appropriate body, that all makers of fonts, code pages,
and printers can refer to in designing their products, and upon which all
makers of terminal emulation software can base their screen displays.

For best results, this project should be a cooperative effort among those
who care about both terminal emulation (and emulation of particular
terminals) and the Universal Character Set. Unfortunately, in many cases
the actual owners or creators of the original terminal character sets in
question are no longer available for consultation.

2. SCOPE

This document represents a survey of the following terminals:

  Digital Equipment Corporation VT100 through VT520 [3-9]
  Heath / Zenith 19 [10]
  Hewlett Packard HP-2621 and HP-2648 [11,12]]
  IBM 3164 and 3270 [15,16]
  Siemens Nixdorf 97801 [21]
  Televideo 922 and 965 [22,23]
  Wyse 60 and 370 [25,26]

as well as:

  IBM PC code page 437 [14]

which is the basis for numerous PC-oriented so-called ANSI emulations.

Even within this fairly narrow scope, the task of settling on a set of
character-cell terminal graphics for Unicode is complicated by the
well-known problems that affect other preexisting character sets to varying
degrees:

 1. Lack of official names for the characters.
 2. Lack of definitive, high-quality pictures of the glyphs.
 3. Lack of descriptions of the purpose and intended use of the glyphs.
 4. Lack of a current registration authority or owner.
 5. Questions of unification of glyphs from different terminal makers.
 6. End-user demand for specific characters or sets.

The issue of unification is complicated by the fact that many of the
terminal graphics characters are designed to join at cell boundaries to form
"pictures" (such as boxes or forms to be filled out) or large characters
(such as big math symbols) spanning multiple rows and/or columns. The
relationship of similar-looking glyphs for different terminals is difficult
to determine -- e.g. exactly where does a line touch an edge, and at what
angle, and does it make a difference? In linguistic terms, which glyphs may
be considered allographs, and which are distinct graphemes?

This proposal does not require any action for well-known terminal
presentation forms such as double-high and/or double-wide characters, bold,
blinking, inverse, underlining, color, etc, since these are not encoding
issues. In particular, no special code points are needed for double-high or
double-wide characters, such as those seen on the DEC VT100 family of
terminals, nor for compressed characters as seen on Data General and DEC
terminals.

This proposal also does not cover true graphics terminals, such as Tektronix
vector graphics units, DEC ReGIS or Sixel graphics, etc, since these
graphics regimes are not character-cell based.

Note that the graphic characters listed in this proposal rarely, if ever,
appear on keyboard key labels. In general, these characters are never
typed, not even on real terminals, but are displayed when the terminal is
commanded into a special mode; for example, with ISO 2022 [17] character-set
designation and invocation escape sequences.

3. ORGANIZATION

This proposal groups terminal graphic characters into four major categories.
Some categories are complete by definition (e.g. the 2-nibble hex codes, of
which there can be only 256), but others should include space for expansion
as new glyphs are discovered or needed. The categories are:

Debugging Tools
  Graphical single-cell representation of C0 and C1 control characters;
  hexadecimal dumps of terminal traffic, etc.

Math Symbols
  Although most math symbols found on terminals are already in Unicode,
  certain terminal-based applications rely on the ability to construct large
  symbols (integral and summation signs, braces, brackets) from smaller
  character-cell-sized pieces.

Line and Box Drawing
  Used for data entry, transaction processing, forms filling, etc, in
  markets ranging from car rental and airline reservations, to medical
  information systems, to online library catalogs. Although Unicode does
  include a basic set (mainly those as U+2500), some others are missing.

Other Miscellaneous Character-Cell graphics.
  Padlocks, stick-figure people, etc, e.g. to indicate the state of the
  keyboard and/or host application, as well as mosaic graphics cells,
  and assorted pictures and dingbats.

This document lists the terminal graphics characters for the terminals in
Section 2, to suggest unifications, and to assigns preliminary, temporary
Unicode values from the Private Use area:

  E000-E08F Control Pictures
  E0A0-E0CF Math Symbols
  E0D0-E0EF Line and Box Drawing
  E0F0-E0FF Miscellaneous single-cell graphic characters
  E100-E1FF Hex Bytes

For a total of 512 positions, not fully populated. Obviously the final
counts, code values, and block allocations, including reserved positions,
are likely to change as this proposal evolves.

All new characters proposed in this document should be precomposed, since no
terminals (with the exception of certain APL and ALA terminals) are capable
of composing characters on the fly from nonspacing diacritics or by
overstriking.

4. GRAPHIC REPRESENTATION OF CONTROL CHARACTERS

Several methods are available for "printing" control characters. First,
there is the de facto standard collection of dingbats in the 0x00-0x1F range
of IBM PC Code Page 437 [14]. As shown in Table 4.1, this is already
adequately covered by Unicode (in which "Code" is the Unicode value and
"IBM" is the IBM Code page value, both hexadecimal).

Table 4.1: IBM PC Code Page 437 C0 Graphics

  Code IBM Unicode Name Code IBM Unicode Name
  00A0 00 Blank 25BA 10 Black right-pointing pointer
  263A 01 White smiling face 25C4 11 Black left-pointing pointer
  263B 02 Black smiling face 2195 12 Up down arrow
  2665 03 Black heart suit 203C 13 Double exclamation mark
  2666 04 Black diamond suit 00B6 14 Pilcrow sign
  2663 05 Black club suit 00A7 15 Section sign
  2660 06 Black space suit 25AC 16 Black rectangle
  2022 07 Bullet 21A8 17 Up down arrow with base
  25D8 08 Inverse bullet 2191 18 Upwards arrow
  25EF 09 Large circle 2193 19 Downwards arrow
  25D9 0A Inverse white circle 2192 1A Rightwards arrow
  2642 0B Male sign 2190 1B Leftwards arrow
  2640 0C Female sign 2319 1C Turned not sign
  266A 0D Eighth note 2194 1D Left right arrow
  266C 0E Beamed 16th notes 25B2 1E Black up-pointing triangle
  263C 0F White sun with rays 25BC 1F Black down-pointing triangle

(Note that "black" and "white" are used in accordance Unicode terminology,
where they denote the presence or absence of (black) ink on the page;
however, any colors at all can appear on a terminal screen.)

More useful in a terminal emulator, however, is the ability to display the
the official abbreviation [1,18], or "name", of the control character in a
single cell, as is done by numerous terminals, as well as by data analyzers
and line monitors, which themselves also tend to be increasingly implemented
in software on PCs.

Some control characters have two-character abbreviations (such as CR, LF,
HT, FF), while others are three characters (NUL, SOH, DC1, DLE). Some
terminals compress three-letter abbreviations to the two-character forms
shown in Table 4.2. All terminals, however, display the abbreviations
diagonally in the character cell, as shown in Figure 4.1.

Figure 4.1: Control Picture Display

 +---+ +---+
 |L | |D | (except the two-character abbreviation appears on the
 | | | C | screen with the characters closer together)
 | F| | 1|
 +---+ +---+

Unicode already has a block of Control Pictures at U+2400 through U+2421,
but (except for "NL" at U+2424) these go horizontally across the character
cell, rather than diagonally, thus making them difficult to distinguish from
normal alphanumeric text. A new, parallel block of C0 control pictures is
needed in which the abbreviations are displayed diagonally. These are
listed in Table 4.2, in which "Code" is the temporary Unicode value, "Name"
is the official (ASCII) abbreviation (and the one used in the Display
Controls character set of the VT220 family [5]), and "2X" is the 2-character
abbreviation (used in the Display Controls font of Televideo [22,23], HP [11],
Perkin Elmer [20], and other terminals).

Table 4.2: C0 Control Pictures

  Code Name 2X Code Name 2X
  E000 NUL NU E010 DLE DL
  E001 SOH SH E011 DC1 D1
  E002 STX SX E012 DC2 D2
  E003 ETX EX E013 DC3 D3
  E004 EOT ET E014 DC4 D4
  E005 ENQ EQ E015 NAK NK
  E006 ACK AK E016 SYN SY
  E007 BEL BL E017 ETB EB
  E009 BS BS E018 CAN CN
  E009 HT HT E019 EM EM
  E00A LF LF E01A SUB SU
  E00B VT VT E01B ESC EC
  E00C FF FF E01C FS FS
  E00D CR CR E01D GS GS
  E00E SO SO E01E RS RS
  E00F SI SI E01F US US

There is little to gain by defining separate 2- and 3-character glyphs for
control characters that have 3-character names; therefore it is suggested
that the full abbreviation (from the Name column) be used, with the
characters arranged diagonally within each cell (rather than horizontally as
in the U+2400 block), and that the 2X column be ignored.

C1 Control characters are specified in ISO-6429 and used in the VT220
family of terminals [5] and the Wyse 370 [26], where they are represented
in the right half of the "display controls" font as shown in Table 4.3 (DEC
terminals use the full name, Wyse terminals use the 2X name). As with C0
controls, the "name" is displayed diagonally within the character cell.
Unicode presently includes no C1 control pictures.

Table 4.3: C1 Control Pictures

  Code Name 2X Code Name 2X
         80 (1) E030 DCS DC
         81 (1) E031 PU1 P1
  E022 BPH (2) E032 PU2 P2
  E023 NBH (2) E033 STS SE
  E024 IND IN (3) E034 CCH CC
  E025 NEL NL E035 MW MW
  E026 SSA SS E036 SPA SP
  E027 ESA ES E037 EPA EP
  E028 HTS HS E038 SOS (2)
  E029 HTJ HJ 99 (1)
  E02A VTS VS E03A SCI (2)
  E02B PLD PD E03B CSI CS
  E02C PLU PU E03C ST ST
  E02D RI RI E03D OSC OS
  E02E SS2 S2 E03E PM PM
  E02F SS3 S3 E03F APC AP

Notes;
 (1) Undefined in ISO-6428, shown on VT220/WY370 terminal by hex value.
 (2) Defined in ISO-6428, but shown on VT220/WY370 terminal by hex value.
 (3) Undefined in ISO-6428, but shown indicated on VT220/WY370 terminal.

Note that three of the C1 control pictures are unassigned (the ones marked
by "(1)", that would be at U+E020, U+E021, and U+E039 if these were
assigned). These positions should be left vacant in case names are assigned
to these characters in a future revision of ISO 6429.

As with C0 controls, it is presumed acceptable to encode the full
abbreviation, without the 2-character alternatives for 3-character forms.

Table 4.4 shows the names of control characters unique to EBCDIC (that is,
the ones it does not share with ASCII).

Table 4.4: EBCDIC Control Pictures

  Code Name Description
  E040 PF Punch Off
  E041 PN Punch On
  E042 LC Lower Case
  E043 UC Upper Case
  E044 SMM Start of Manual Message
  E045 TM Tape Mark
  E046 RES Restore
  E047 IL Idle
  E048 CC Cursor Control
  E049 CU1 Customer Use 1
  E04A CU2 Customer Use 2
  E04B CU3 Customer Use 3
  E04C CU4 Customer Use 4
  E04D IFS Interchange File Separator
  E04E IGS Interchange Group Separator
  E04F IUS Interchange Unit Separator
  E050 DS Digit Select
  E051 SOS Start of Significance
  E051 BYP Bypass
  E053 SM Set Mode

Names for IBM 3270 terminal Orders, LU 1 SCS Control Codes, and Format
Control Orders, which are not already listed as ASCII or EBCDIC control
codes, are shown in Table 4.5, to be used in debugging 3270 data streams.

Table 4.5: 3270 Control Pictures

  Code Name Description
  E060 VCS Vertical Channel Select
  E061 GE Graphics Escape
  E062 ENP Enable Presentation
  E063 IRS Interchange Record Separator
  E064 INP Inhibit Presentation
  E065 SA Set Attribute
  E066 FMT Format
  E067 TRN Transparent
  E068 SF Start Field
  E069 SFE Start Field Extended
  E06A SBA Set Buffer Address
  E06B MF Modify Field
  E06C PT Program Tab
  E06D RA Repeat to Address
  E06E EUA Erase to Unprotected Address
  E06F DUP Duplicate
  E070 FM Field Mark
  E071 EO Eight Ones

Table 4.6 shows additional characters that may be included in "display
controls" mode on various terminals.

Table 4.6: Additional Control-Like Pictures

  Code Name Remarks
  E080 SP Space (like U+2420 but arranged diagonally)
  E081 DEL Delete (Rubout) (2-character name: DT)
  E082 LS1 Locking Shift 1 (ISO name for SO)
  E083 LS0 Locking Shift 0 (ISO name for SI)
  E084 IS4 ISO Name for FS: Information Separator 4
  E085 IS3 ISO Name for GS: Information Separator 3
  E086 IS2 ISO Name for RS: Information Separator 2
  E087 IS1 ISO Name for US: Information Separator 1
  E088 CL Clear or Cancel Line (used on HP terminals)
  E089 BP From the Data General Word Processing Set
  E08A BE From the Data General Word Processing Set
  E08B FN From the Data General Word Processing Set
  E08C FE From the Data General Word Processing Set
  E08D HF From the Data General Word Processing Set
  E08E Diagonal crosshatches (1)
  E08F Picture of Bell (used on HP-2621 to show BEL, 0x07)
  2422 Blank symbol (substitute blank, b with stroke) (2)
  2423 Blank symbol (open box) (2)
  2424 NL DEC Special Graphics 0x68, EBCDIC control New Line (2)

  Notes:
   (1) Used for DEL on Televideo, HP. Similar to U+25A9, but without border.
   (2) Already in Unicode.

Summary:

  115 new characters required for graphic representation of
  control characters. Range: U+E000 through U+E09F, 160 positions with
  45 vacant for expansion.

5. HEX BYTES

Hexadecimal byte values, 2 hex digits each. Like display controls, but for
all 256 8-bit byte values, showing the byte code in hexadecimal, rather than
the (context-dependent) name. For hex debugging (in terminal emulators,
line monitors, protocol analyzers, etc). Should be arranged diagonally
within the character cell as shown in Figure 5.1:

Figure 5.1: Hex Byte Pictures

 +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+
 |0 | |0 | |0 | ... |0 | |1 | |1 | |1 | ... |E | |F | ... |F |
 | 1| | 2| | 3| | F| | 0| | 1| | 2| | F| | 0| | F|
 +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+

One glyph is required for each hex byte code 00 through FF, or 256 glyphs
in all. Suggested temporary codes: U+E100 through U+E1FF.

Note that the SNI "IBM" character set contains glyphs for 01 through 1F,
which are shown sideways. I see no reason to encode these separately, but
others might disagree.

Summary: 256 new characters, U+E100 through U+E1FF.

6. MATH SYMBOLS

Unicode has a generous supply of math symbols, and no doubt more are in the
works. And of course it also includes the Latin, Greek, Fraktur, Hebrew,
and other letters used in mathematical notation.

However, terminal emulators also need special glyphs designed to be joined
together in adjacent character cells, vertically or horizontally, to form
large math symbols such as integrals, summation signs, braces, or brackets,
such as the integral top and bottom that already exist at U+2320 and U+2321.
Several other single-cell characters are also missing, including the small
radical sign from the DEC Technical character set. Table 6.1 lists the
needed characters, along with suggested temporary codes for them. At least
one real terminal reference is shown for each character, in column/row
notation, or an IBM Graphic Character Global Identifier (GCGID) [14]. Note:
SB stands for Square Bracket.

Table 6.1: Supplementary Math Symbols

  Code Description Reference
  E0A0 Extensible left brace middle DEC Tech 02/15
  E0A1 Extensible left parenthesis bottom DEC Tech 02/12, IBM SS210000
  E0A2 Extensible left parenthesis top DEC Tech 02/11, IBM SS200000
  E0A3 Extensible left SB bottom DEC Tech 02/08
  E0A4 Extensible left SB top DEC Tech 02/07
  E0A5 Extensible right brace middle DEC Tech 03/00
  E0A6 Extensible UR or LL brace section IBM SS240000
  E0A7 Extensible LR or UL brace section IBM SS250000
  E0A8 Extensible right parenthesis bottom DEC Tech 02/14, IBM SS230000
  E0A9 Extensible right parenthesis top DEC Tech 02/13, IBM SS220000
  E0AA Extensible right SB bottom DEC Tech 02/10
  E0AB Extensible right SB top DEC Tech 02/08
  E0AC Summation symbol bottom DEC Tech 03/02, DG Math 01/09(1)
  E0AD Summation symbol top DEC Tech 03/01, DG Math 01/08(1)
  E0AE Right ceiling corner DEC Tech 03/05
  E0AF Right floor corner DEC Tech 03/06
  E0B0 Radical symbol, small DEC Tech 00/01
  E0B1 Radical symbol with stroke DG Math 01/13
  E0B2 Superscript Latin small letter i SNI Math 03/00
  E0B3 Latin small letter a with underbar SNI Math 04/04 (2)
  E0B4 Latin capital letter H with bar SNI Math 04/05 (2)
  E0B5 Latin small letter h with bar SNI Math 04/06 (2)
  E0B6 Latin capital letter L with dot SNI Math 04/07 (2)
  E0B7 Latin small letter L with dot SNI Math 04/08 (2)
  E0B8 Latin capital letter O with underbar SNI Math 04/09 (2)
  E0B9 Latin small letter t with bar SNI Math 04/10 (2)
  E0BA Latin small script letter t with bar SNI Math 04/12 (2)
  E0BB ??? SNI Math 04/11 (3)
  E0BC ??? SNI Math 04/11 (3)
  E0BD ??? SNI Math 04/11 (3)
  E0BE Superscript almost-equal-to sign SNI IBM 06/12
  E0BF Superscript capital Greek letterSigma SNI IBM 06/13
  E0C0 Superscript infinity sign SNI IBM 07/12
  E0C1 Superscript proportional-to sign SNI IBM 07/13

References:
  DEC Tech = Digital Equipment Corporation Technical Character Set [5]
  SNI Math = Siemens Nixdorf Mathematisch [21]
  DG Math = Data General Word-Processing, Greek, and Math Character Set [2]
  IBM = IBM Graphic Character Global Identifier (GCGID) [14]

Notes:
 (1) Also GCGID SS280000 and SS29000.
 (2) I'm not too sure about some of the SNI symbols. I'm only guessing at
     what the pictures (in the SNI 97801 manual) are supposed to mean; there
     are no accompanying character names or text.
 (3) These look like permutations of lowercase Latin letter n with hook
     (small eng), in various sizes, with or without a vertical accent mark
     on top. It's not clear to me whether these can be unified with any
     existing Unicode characters.

As far as I can tell, none of the SNI letterforms listed above are in
Unicode 2.0.

Summary: 34 new characters, Range E0A0-E0CF, with 14 positions left vacant.

7. LINE, BOX, AND BLOCK CHARACTERS

A particular need addressed by this proposal is the continued ability to
support (sometimes mission-critical) terminal-based forms-filling
applications that also require entry and display of international
characters, as terminals are replaced by PCs. So far, Unicode has provided
the international characters, but not necessarily all the needed
character-cell based forms-drawing capabilities.

Some terminals have vertical and horizontal lines that are not centered
within the character cell, and currently not found in Unicode. Others have
black rectangles or other shapes not found in the U+2580 block.

Abbreviations:
  V = Vertical
  H = Horizontal
  L = Left
  R = Right
  LL = Lower Left
  LR = Lower Right
  UL = Upper Left
  UR = Upper Right

Terminology:

Quadrant
  A black rectangle filling one quarter of a cell, with one corner in the
  center and the opposite corner at a corner of the cell. So "Quadrant UL"
  is the upper left quadrant; "Quadrant UL and UR" is the top half of the
  cell (which happens to be coincident with U+2580 and so is not included
  here).

Line
  Refers to a line that extends all the way to opposite edge(s) of a cell,
  designed to be joined to (a) line(s) in the adjacent cell(s).

Bar
  Refers to a horizontal line that does not touch any cell edges.

Wedge
  Refers to a character cell with a diagonal line connecting opposite
  corners, dividing it into two triangles; one black, the other white. Thus
  an UL Wedge is similar to U+25E9, except it fills the entire character
  cell.

Framus
  (Pick a better word!) is a shape composed of two triangles with their
  points meeting at the center of the cell to form an X with bars across the
  top and bottom, closing the open ends. A black framus has the two
  triangles filled in; a white one is in outline form. A framus with center
  bar has a horizontal line through the center of the cell.

Figure 7.1: "Framus" Glyphs

    White Black With Bar
   ******* ******* *******
    * * ***** * *
     * * *** * *
      * * *********
     * * *** * *
    * * ***** * *
   ******* ******* *******

Table 7.1: Additional Line, Box, and Block Characters

  Code Description References
  E0D0 L V box line, extensible H19 07/12 (1)
  E0D1 R V box line, extensible H19 07/13 (1)
  E0D2 UL Wedge H19 07/02, IBM SF870000
  E0D3 UR Wedge H19 05/14, IBM SF860000
  E0D4 LL Wedge IBM SF850000
  E0D5 LR Wedge IBM SF840000
  E0D6 H line - Scan 1 DSG 06/15, H19 07/10, WG3 05/00, TVI 09/00
  E0D7 H line - Scan 3 DSG 07/00, Wyse ANSI 01/01, WG3 05/00
        H line - Scan 5 DSG 07/01, Wyse ANSI 02/02 (2)
  E0D9 H line - Scan 7 DSG 07/02, Wyse ANSI 01/03, WG3 05/01
  E0DA H line - Scan 9 DSG 07/03, H19 07/11, WG3 05/01, TVI 09/01
  E0DB Quadrant LL H19 06/13, WG3 05/05, TVI 09/05
  E0DC Quadrant LR H19 06/12, WG3 05/04, TVI 09/04
  E0DD Quadrant UL H19 06/14, WG3 05/06, TVI 09/06
  E0DE Quadrant UL and LL and LR WG3 05/11, TVI 09/11
  E0DF Quadrant UL and LR H19 06/10 (3)
  E0E0 Quadrant UL and UR and LL WG3 05/12, TVI 09/12
  E0E1 Quadrant UL and UR and LR WG3 05/13, TVI 09/13
  E0E2 Quadrant UR H19 111, WG3 83, TVI 09/03
  E0E3 Quadrant UR and LL (for completeness)
  E0E4 Quadrant UR and LL and LR WG3 05/14, TVI 09/14
  E0E5 Full black diamond TVI 09/02 (4)
  E0E6 Black framus DGM 06/08
  E0E7 Black framus + H center bar DGM 06/09
  E0E8 White framus DGM 06/10
  E0E9 White framus + H center bar DGM 06/11
  E0EA R & L arrow to V center bar DGM 03/13
  E0EB Up arrow to H center line DGL 02/12
  E0EC R arrow to V center line DGL 02/13
  E0ED L arrow to V center line DGL 02/14
  E0EE Down arrow to H center line DGL 02/12
  E0EF Box drawing double dash H DGL 03/12 (5)

References:
  DGM = Data General Word-Processing, Greek, and Math Character Set [2]
  DGL = Data General Line Drawing Character Set [2]
  DSG = The DEC Special Graphics Character Set [5]
  H19 = The Heath/Zenith 19 Graphics Character Set [10]
  WG3 = The Wyse Graphics 3 Character Set [25]
  TVI = The Televideo 965 Multinational Character Set [23]
  IBM = Graphic Character Global Identifier (GCGID) [14]
  Wyse ANSI = Wyse 60 "Standard ANSI", "UK ANSI", and "ANSI Graphics" [25]

Notes:
  (1) The vertical box lines are near, but not touching, the left and right
      edges of the cell, respectively, and are two pixels thick on the H19
      screen. Similar to IBM GCID SF640000 and SF650000, respectively.
  (2) The center horizontal scan line is already in Unicode at U+2500.
  (3) Only on Zenith models, not original Heathkits.
  (4) Full black diamond, with points touching center of each cell wall.
  (5) Similar to U+2504 but double rather than triple.

Also note that Quadrants UL+UR, UR+LR, LL+LR, UL+LL (half blocks) are
already encoded at block U+2580.

Summary:
 31 New glyphs, Range E0D0 to E0EF, one vacancy.

8. MISCELLANEOUS SINGLE-CELL GLYPHS

Table 8.1: Miscellaneous Single-Cell Terminal Glyphs

  Code Description Reference
  E0F0 Reverse Question Mark DEC VTxxx, Wyse, Televideo (1)
  E0F1 Box with X inside DG Math 06/07, GCGID SP500000
  E0F2 Human stick figure with hat SNI Facet 04/14
  E0F3 Clock (with hands at 3:00) SNI Klammern 05/01
  E0F4 Overscore asterisk IBM 3270
  E0F5 Overscore semicolon IBM 3270
  E0F6 Padlock (keyboard locked) IBM 3270

Notes:
 (1) The reverse question is essential in VT terminal emulation, where it
     indicates that an invalid code was received, or a parity or other
     error was detected. It also stands for SUB and/or RS in Wyse display
     controls mode, and is the glyph for 0xFF in the Televideo Multinational
     Character Set [23]. And it it is also a glyph in the DG Special
     Graphics Character Set [2].

Summary:
 7 New glyphs, Range E0F0 to E0FF, 9 vacant.

9. UNFINISHED BUSINESS

The selection of characters presented in this draft is far from
comprehensive. Hundreds of other terminals from the past 30+ years are
likely to have glyphs or entire character sets covered neither here nor
in Unicode, and these might or might not be important in some application
somewhere. Readers are invited, therefore, to propose any needed
additions, bearing in mind that Unicode code space is not unlimited.

No attempt was made to account for the many Viewdata, Videotex, Minitel,
NAPLPS, or other mosaic graphics character sets. These should be tackled,
if appropriate, by someone who knows something about them.

Several character sets found in the references consulted are ignored here,
fully or in part, due to lack of motivation (nobody has ever asked us to
support them). Obviously these, and any other missing sets, can be
considered if there is a demand.

Siemens Nixdorf Facet
  A set of 95 mosaic graphics, but not resembling any of the ISO Videotex
  mosaic sets; difficult to describe.

Siemens Nixdorf Klammern
  A set of 95 assorted blobs, bracket and brace pieces, clocks, arrows,
  hourglasses, and Greek letters, some of which are unique; others can be
  unified with existing Unicode characters or characters in this proposal.

Hewlett Packard Line Drawing
  Mostly coincident with Unicode box-drawing set at U+2500, but with a
  handful of unique characters, such as single-to-triple box intersections,
  single-to-double intersections with wide spacing, etc. These should be
  mappable to existing U+25xx glyphs without causing riots in the streets.

Hewlett Packard Big Character Pieces
  Thick line segments for drawing large characters, used on the HP-2648.

And no doubt many more...

10. SUMMARY OF PROPOSED ADDITIONAL CHARACTERS

If all the proposed new characters are added to the UCS, this will enable
terminal emulators to fully handle at least the following terminal character
sets, which were not previously covered in full:

  ASCII/ISO Display Controls for DEC, Hewlett Packard, Televideo, and others.
  EBCDIC Display Controls for the IBM 3270
  Hexadecimal debugging
  DEC Technical
  DEC Special Graphics
  Data General Word-Processing, Greek, and Math (1)
  Data General Line Drawing
  Heath/Zenith 19 Graphics
  Hewlett Packard 2621 and HPTERM
  Siemens Nixdorf's "IBM" set (plus parts of its Klammern and Facet sets)
  Televideo Multinational
  Wyse Graphics 3 (Graphics 1 and 2 were already covered)
  Wyse "Standard ANSI", "UK ANSI", and "ANSI Graphics"
 
 (1) Except the DG logo character, which is presumed off limits.

Terminals supporting these character sets are numerous indeed. An
incomplete list includes: DEC VT100, VT102, VT220/240, VT320/330/340,
VT420, VT520/525; Data General 210, 215, 217, 413, and 463; the Heath /
Zenith 19; and numerous Televideo and Wyse models.

Table 10.1 lists the new characters proposed in this document.

Table 10.1: Census of New Characters

  Code Glyph Descripton
  E000 NUL Diagonal Control Picture Null
  E001 SOH Diagonal Control Picture Start of Heading
  E002 STX Diagonal Control Picture Start of Text
  E003 ETX Diagonal Control Picture End of Text
  E004 EOT Diagonal Control Picture End of Transmission
  E005 ENQ Diagonal Control Picture Enquiry
  E006 ACK Diagonal Control Picture Acknowledge
  E007 BEL Diagonal Control Picture Bell
  E009 BS Diagonal Control Picture Backspace
  E009 HT Diagonal Control Picture Horizontal Tab
  E00A LF Diagonal Control Picture Line Feed
  E00B VT Diagonal Control Picture Vertical Tab
  E00C FF Diagonal Control Picture Form Feed
  E00D CR Diagonal Control Picture Carriage Return
  E00E SO Diagonal Control Picture Shift Out
  E00F SI Diagonal Control Picture Shift In
  E010 DLE Diagonal Control Picture Data Link Escape
  E011 DC1 Diagonal Control Picture Device Control 1
  E012 DC2 Diagonal Control Picture Device Control 2
  E013 DC3 Diagonal Control Picture Device Control 3
  E014 DC4 Diagonal Control Picture Device Control 4
  E015 NAK Diagonal Control Picture Negative Acknowledge
  E016 SYN Diagonal Control Picture Synchronous Idle
  E017 ETB Diagonal Control Picture End of Transmission Block
  E018 CAN Diagonal Control Picture Cancel
  E019 EM Diagonal Control Picture End of Medium
  E01A SUB Diagonal Control Picture Substitute
  E01B ESC Diagonal Control Picture Escape
  E01C FS Diagonal Control Picture Field Separator
  E01D GS Diagonal Control Picture Group Separator
  E01E RS Diagonal Control Picture Record Separator
  E01F US Diagonal Control Picture Unit Separator
  E020 (vacant)
  E021 (vacant)
  E022 BPH Diagonal Control Picture Break Permitted Here
  E023 NBH Diagonal Control Picture No Break Here
  E024 IND Diagonal Control Picture Index
  E025 NEL Diagonal Control Picture Next Line
  E026 SSA Diagonal Control Picture Start Selected Area
  E027 ESA Diagonal Control Picture End Selected Area
  E028 HTS Diagonal Control Picture Character Tabulation Set
  E029 HTJ Diagonal Control Picture Character Tabulation with Justification
  E02A VTS Diagonal Control Picture Line Tabulation Set
  E02B PLD Diagonal Control Picture Partial Line Forward
  E02C PLU Diagonal Control Picture Partial Line Backward
  E02D RI Diagonal Control Picture Reverse Line Feed
  E02E SS2 Diagonal Control Picture Single Shift 2
  E02F SS3 Diagonal Control Picture Single Shift 3
  E030 DCS Diagonal Control Picture Device Control String
  E031 PU1 Diagonal Control Picture Private Use 1
  E032 PU2 Diagonal Control Picture Private Use 2
  E033 STS Diagonal Control Picture Set Transmit State
  E034 CCH Diagonal Control Picture Cancel Character
  E035 MW Diagonal Control Picture Message Waiting
  E036 SPA Diagonal Control Picture Start Protected (Guarded) Area
  E037 EPA Diagonal Control Picture End Protected (Guarded) Area
  E038 SOS Diagonal Control Picture Start of String
  E039 (vacant)
  E03A SCI Diagonal Control Picture Single Character Introducer
  E03B CSI Diagonal Control Picture Control Sequence Introducer
  E03C ST Diagonal Control Picture String Terminator
  E03D OSC Diagonal Control Picture Operating System Command
  E03E PM Diagonal Control Picture Privacy Message
  E03F APC Diagonal Control Picture Application Program Command
  E040 PF Diagonal Control Picture Punch Off
  E041 PN Diagonal Control Picture Punch On
  E042 LC Diagonal Control Picture Lower Case
  E043 UC Diagonal Control Picture Upper Case
  E044 SMM Diagonal Control Picture Start of Manual Message
  E045 TM Diagonal Control Picture Tape Mark
  E046 RES Diagonal Control Picture Restore
  E047 IL Diagonal Control Picture Idle
  E048 CC Diagonal Control Picture Cursor Control
  E049 CU1 Diagonal Control Picture Customer Use 1
  E04A CU2 Diagonal Control Picture Customer Use 2
  E04B CU3 Diagonal Control Picture Customer Use 3
  E04C CU4 Diagonal Control Picture Customer Use 4
  E04D IFS Diagonal Control Picture Interchange File Separator
  E04E IGS Diagonal Control Picture Interchange Group Separator
  E04F IUS Diagonal Control Picture Interchange Unit Separator
  E050 DS Diagonal Control Picture Digit Select
  E051 SOS Diagonal Control Picture Start of Significance
  E051 BYP Diagonal Control Picture Bypass
  E053 SM Diagonal Control Picture Set Mode
  E054 (vacant through E05F)
  E060 VCS Vertical Channel Select
  E061 GE Graphics Escape
  E062 ENP Enable Presentation
  E063 IRS Interchange Record Separator
  E064 INP Inhibit Presentation
  E065 SA Set Attribute
  E066 FMT Format
  E067 TRN Transparent
  E068 SF Start Field
  E069 SFE Start Field Extended
  E06A SBA Set Buffer Address
  E06B MF Modify Field
  E06C PT Program Tab
  E06D RA Repeat to Address
  E06E EUA Erase to Unprotected Address
  E06F DUP Duplicate
  E070 FM Field Mark
  E071 EO Eight Ones
  E072 (vacant through E07F)
  E080 SP Diagonal Control Picture Space
  E081 DEL Diagonal Control Picture Delete
  E082 LS1 Diagonal Control Picture Locking Shift 1
  E083 LS0 Diagonal Control Picture Locking Shift 0
  E084 IS4 Diagonal Control Picture Information Separator 4
  E085 IS3 Diagonal Control Picture Information Separator 3
  E086 IS2 Diagonal Control Picture Information Separator 2
  E087 IS1 Diagonal Control Picture Information Separator 1
  E088 CL Diagonal Control Picture Cancel Line
  E089 BP Diagonal Control Picture DG Word Processing BP
  E08A BE Diagonal Control Picture DG Word Processing BE
  E08B FN Diagonal Control Picture DG Word Processing FN
  E08C FE Diagonal Control Picture DG Word Processing FE
  E08D HF Diagonal Control Picture DG Word Processing HF
  E08E Diagonal crosshatches
  E08F Picture of bell
  E090 (vacant through E09F)
  E0A0 Extensible left brace middle
  E0A1 Extensible left parenthesis bottom
  E0A2 Extensible left parenthesis top
  E0A3 Extensible left SB bottom
  E0A4 Extensible left SB top
  E0A5 Extensible right brace middle
  E0A6 Extensible UR or LL brace section
  E0A7 Extensible LR or UL brace section
  E0A8 Extensible right parenthesis bottom
  E0A9 Extensible right parenthesis top
  E0AA Extensible right SB bottom
  E0AB Extensible right SB top
  E0AC Summation symbol bottom
  E0AD Summation symbol top
  E0AE Right ceiling corner
  E0AF Right floor corner
  E0B0 Radical symbol, small
  E0B1 Radical symbol with stroke
  E0B2 Superscript Latin small letter i
  E0B3 Latin small letter a with underbar
  E0B4 Latin capital letter H with bar
  E0B5 Latin small letter h with bar
  E0B6 Latin capital letter L with dot
  E0B7 Latin small letter L with dot
  E0B8 Latin capital letter O with underbar
  E0B9 Latin small letter t with bar
  E0BA Latin small script letter t with bar
  E0BB Eng-like letter
  E0BC Eng-like letter, fatter
  E0BD Eng-like letter with vertical stroke
  E0BE Superscript almost-equal-to sign
  E0BF Superscript capital Greek letterSigma
  E0C0 Superscript infinity sign
  E0C1 Superscript proportional-to sign
  E0C2 (vacant through E0CF)
  E0D0 L V box line, extensible
  E0D1 R V box line, extensible
  E0D2 UL Wedge
  E0D3 UR Wedge
  E0D4 LL Wedge
  E0D5 LR Wedge
  E0D6 H line - Scan 1
  E0D7 H line - Scan 3
  E0D8 (vacant)
  E0D9 H line - Scan 7
  E0DA H line - Scan 9
  E0DB Quadrant LL
  E0DC Quadrant LR
  E0DD Quadrant UL
  E0DE Quadrant UL and LL and LR
  E0DF Quadrant UL and LR
  E0E0 Quadrant UL and UR and LL
  E0E1 Quadrant UL and UR and LR
  E0E2 Quadrant UR
  E0E3 Quadrant UR and LL
  E0E4 Quadrant UR and LL and LR
  E0E5 Full black diamond
  E0E6 Black framus
  E0E7 Black framus + H center bar
  E0E8 White framus
  E0E9 White framus + H center bar
  E0EA R & L arrow to V center bar
  E0EB Up arrow to H center line
  E0EC R arrow to V center line
  E0ED L arrow to V center line
  E0EE Down arrow to H center line
  E0EF Box drawing double dash H
  E0F0 Reverse Question Mark
  E0F1 Box with X inside
  E0F2 Human stick figure with hat
  E0F3 Clock at 3:00
  E0F4 Overscore asterisk
  E0F5 Overscore semicolon
  E0F6 Padlock
  E0F7 (vacant through E0FF)
  E100 (through E1FF): Hex Bytes

Summary:
  E000 through E1FF = 512 positions, 42 vacant.
  Codes in Exxx block to be moved to a non-Private Use area.

11. REFERENCES

 [1] American National Standards Institute, ANSI X3.4-1986, Code for
     Information Interchange (ASCII), 1986.

 [2] Data General, Programming the Display Terminal: Models D217, D413, and
     D463, Westboro, MA, 1991.

 [3] Digital Equipment Corporation, VT100 User Guide, EK-VT100-UG-002,
     Maynard, MA, 1979.

 [4] Digital Equipment Corporation, VT100 Video Terminal User Guide,
     EK-VT102-UG-003, Maynard, MA, 1982.

 [5] Digital Equipment Corporation, VT220 Owner's Manual, EK-VT220-UG-003,
     Maynard, MA, 1984.

 [6] Digital Equipment Corporation, VT220 Series Programmer Reference
     Manual, EK-VT240-RM-002, Maynard, MA, 1984.

 [7] Digital Equipment Corporation, VT330/VT340 Programmer Reference Manual,
     Volume 1: Text Programming, ED-VT3XX-TP-002, Maynard, MA, 1988.

 [8] Digital Equipment Corporation, Installing and Using the VT420 Video
     Terminal EK-VT420-UG.002, Maynard, MA, 1988.

 [9] Digital Equipment Corporation, VT520/VT525 Video Terminal Programmer
     Inforamtion, EK-VT520-RM.A01, Maynard, MA, 1994.

[10] Heathkit Manual for the Video Terminal Model H19, The Heath Company,
     Benton Harbor, MI, 1979.

[11] Hewlett Packard 2621A/P Interactive Terminal Owner's Manual, 1978.

[12] Hewlett Packard 2648A Graphics Terminal Reference Manual, 1977.

[13] IBM System/360 Principles of Operation, GA22-6821-8, Poughkeepsie,
     NY, 1970.

[14] IBM National Language Design Guide, Volume 2: National Language
     Support Reference Manual, 4th Edition, North York, ON, 1994.

[15] IBM 3270 Information Display System, Data Stream Programmer's
     Reference, GA23-0059-06, 1991.

[16] IBM 3164 ASCII Color Display Station Description, GA18-2317-1, 1986.

[17] ISO International Standard 2022, Information processing -- ISO
     7-bit and 8-bit coded character sets -- Code extension techniques,
     Third Edition, Geneva, 1986.

[18] ISO/IEC International Standard 6429, Information technology --
     Control functions for coded character sets, Third Edition, Geneva, 1992.

[19] ISO/IEC 10646-1, International Standard 10646,
     Information Processing -- Multiple-Octet Coded Character Set,
     1993-now.

[20] Perkin Elmer Model 1100 User's Manual, Randolph, NJ, 1978.

[21] Siemens Nixdorf, Bildschirmeinheit 97801-5xx Schnittstellen,
     Benutzerhandbuch, M|nchen, 1991.

[22] Televideo 922 Video Terminal Display Operator's Manual, Sunnyvale, CA,
     1984.

[23] Televideo 965 Video Terminal Display Operator's Manual, Sunnyvale, CA,
     1988.

[24] The Unicode Standard, Version 2.0, Addison-Wesley Developers
     Press, 1996.

[25] Wyse WY-60 Programmer's Guide, Wyse Technology, San Jose, CA, 1987.

[26] Wyse WY-370 Programmer's Guide, Wyse Technology, San Jose, CA, 1990.

(End)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT