Sorry for the length of the following; if you're not interested,
skip it. The intention is to bring likeminded parties out of the woodwork;
if you are one, please contact me and we can continue the topic offline.
- Frank
TERMINAL GRAPHICS FOR UNICODE
Frank da Cruz
The Kermit Project
Columbia University
New York City
http://www.columbia.edu/kermit/
D R A F T # 1
Wed Sep 30 21:15:31 1998
THIS IS A PREFORMATTED PLAIN-TEXT ASCII DOCUMENT. IT IS DESIGNED TO BE
VIEWED AS-IS IN A FIXED-PITCH FONT. ITS WIDEST LINE IS 79 COLUMNS. IT
CONTAINS NO TABS. IF IT LOOKS MESSY TO YOU, PLEASE FEEL FREE TO PICK UP
A CLEAN COPY AT:
ftp://kermit.columbia.edu/kermit/charsets/ucsterminal.txt
PLEASE SEND COMMENTS AND SUGGESTIONS TO THE AUTHOR AT:
ABSTRACT
A selection of terminal graphics characters is proposed for Unicode [24]
and ISO 10646 [19] to allow Unicode-based terminal emulation software to
(a) display glyphs that are found on popular types of terminals but
currently are not available in Unicode, and (b) interoperate with other
Unicode applications.
CONTENTS
1. Introduction
2. Scope
3. Organization
4. Graphic Representation of Control Characters
5. Hex Bytes
6. Math Symbols
7. Line and Box Drawing Characters
8. Miscellaneous Single-Cell Glyphs
9. Unfinished Business
10. Summary of Proposed Additional Characters
11. References
Tables:
4.1. IBM PC Code Page 437 C0 Graphics
4.2. C0 Control Pictures
4.3. C1 Control Pictures
4.4. 3270 Control Pictures
4.5. EBCDIC Control Pictures
4.6. Additional Control-Like Pictures
6.1. Supplementary Math Symbols
7.1. Additional Line, Box, and Block Characters
8.1. Miscellaneous Single-Cell Terminal Glyphs
10.1. Census of New Characters
Figures:
4.1. Control Picture Display
5.1. Hex Byte Pictures
7.1. "Framus" Glyphs
1. INTRODUCTION
Terminal-host communication was the dominant form of interaction between
human and computer from about 1974 (when CRTs became affordable) to about
1994 (when the Web and Windows took over the mass market). Terminal-host
communication is still widespread, especially in large organizations, and is
expected to remain so for decades to come, playing an important part in
organizations like universities, hospitals, and government agencies, as well
as corporations, with central computing facilities, for use in applications
ranging from sofware development and system/network administration, to email
and text-based Web access, to data entry and inquiry, to transaction
processing.
A terminal, for purposes of this document, is a device for entry and display
of text in a fixed-pitch font on a screen (or on paper) in which characters
are displayed in rows and columns of fixed size "cells". Terminals
generally display the characters of ASCII [1] or EBCDIC [13], and sometimes
also accented or non-Roman letters (or ideograms), and often also "graphic"
(non-alphabetic, non-digit, non-punctuation) characters for purposes of
line- and box-drawing, mathematics, or other special effects.
In recent years, physical terminals have largely disappeared from the scene,
their functions subsumed into PCs running terminal-emulation software
alongside other applications. Unicode has effectively met the need for
encoding the earth's writing systems, but it is not well suited to terminal
emulation since it lacks some of the required graphics characters.
Without a standard encoding for the missing glyphs, each maker of terminal
emulation software must create or contract for custom fonts with private
encodings. Such fonts are not compatible with other (otherwise compatible)
fonts on the same platform (e.g. when copying and pasting between
applications), nor with each other. Furthermore, should Unicode printers
become standard equipment on PCs, terminal graphics characters will not
print correctly on them.
This document proposes a modest repertoire of terminal graphics characters
to be added to Unicode and ISO 10646, with specific encoding to be decided
by the UTC or other appropriate body, that all makers of fonts, code pages,
and printers can refer to in designing their products, and upon which all
makers of terminal emulation software can base their screen displays.
For best results, this project should be a cooperative effort among those
who care about both terminal emulation (and emulation of particular
terminals) and the Universal Character Set. Unfortunately, in many cases
the actual owners or creators of the original terminal character sets in
question are no longer available for consultation.
2. SCOPE
This document represents a survey of the following terminals:
Digital Equipment Corporation VT100 through VT520 [3-9]
Heath / Zenith 19 [10]
Hewlett Packard HP-2621 and HP-2648 [11,12]]
IBM 3164 and 3270 [15,16]
Siemens Nixdorf 97801 [21]
Televideo 922 and 965 [22,23]
Wyse 60 and 370 [25,26]
as well as:
IBM PC code page 437 [14]
which is the basis for numerous PC-oriented so-called ANSI emulations.
Even within this fairly narrow scope, the task of settling on a set of
character-cell terminal graphics for Unicode is complicated by the
well-known problems that affect other preexisting character sets to varying
degrees:
1. Lack of official names for the characters.
2. Lack of definitive, high-quality pictures of the glyphs.
3. Lack of descriptions of the purpose and intended use of the glyphs.
4. Lack of a current registration authority or owner.
5. Questions of unification of glyphs from different terminal makers.
6. End-user demand for specific characters or sets.
The issue of unification is complicated by the fact that many of the
terminal graphics characters are designed to join at cell boundaries to form
"pictures" (such as boxes or forms to be filled out) or large characters
(such as big math symbols) spanning multiple rows and/or columns. The
relationship of similar-looking glyphs for different terminals is difficult
to determine -- e.g. exactly where does a line touch an edge, and at what
angle, and does it make a difference? In linguistic terms, which glyphs may
be considered allographs, and which are distinct graphemes?
This proposal does not require any action for well-known terminal
presentation forms such as double-high and/or double-wide characters, bold,
blinking, inverse, underlining, color, etc, since these are not encoding
issues. In particular, no special code points are needed for double-high or
double-wide characters, such as those seen on the DEC VT100 family of
terminals, nor for compressed characters as seen on Data General and DEC
terminals.
This proposal also does not cover true graphics terminals, such as Tektronix
vector graphics units, DEC ReGIS or Sixel graphics, etc, since these
graphics regimes are not character-cell based.
Note that the graphic characters listed in this proposal rarely, if ever,
appear on keyboard key labels. In general, these characters are never
typed, not even on real terminals, but are displayed when the terminal is
commanded into a special mode; for example, with ISO 2022 [17] character-set
designation and invocation escape sequences.
3. ORGANIZATION
This proposal groups terminal graphic characters into four major categories.
Some categories are complete by definition (e.g. the 2-nibble hex codes, of
which there can be only 256), but others should include space for expansion
as new glyphs are discovered or needed. The categories are:
Debugging Tools
Graphical single-cell representation of C0 and C1 control characters;
hexadecimal dumps of terminal traffic, etc.
Math Symbols
Although most math symbols found on terminals are already in Unicode,
certain terminal-based applications rely on the ability to construct large
symbols (integral and summation signs, braces, brackets) from smaller
character-cell-sized pieces.
Line and Box Drawing
Used for data entry, transaction processing, forms filling, etc, in
markets ranging from car rental and airline reservations, to medical
information systems, to online library catalogs. Although Unicode does
include a basic set (mainly those as U+2500), some others are missing.
Other Miscellaneous Character-Cell graphics.
Padlocks, stick-figure people, etc, e.g. to indicate the state of the
keyboard and/or host application, as well as mosaic graphics cells,
and assorted pictures and dingbats.
This document lists the terminal graphics characters for the terminals in
Section 2, to suggest unifications, and to assigns preliminary, temporary
Unicode values from the Private Use area:
E000-E08F Control Pictures
E0A0-E0CF Math Symbols
E0D0-E0EF Line and Box Drawing
E0F0-E0FF Miscellaneous single-cell graphic characters
E100-E1FF Hex Bytes
For a total of 512 positions, not fully populated. Obviously the final
counts, code values, and block allocations, including reserved positions,
are likely to change as this proposal evolves.
All new characters proposed in this document should be precomposed, since no
terminals (with the exception of certain APL and ALA terminals) are capable
of composing characters on the fly from nonspacing diacritics or by
overstriking.
4. GRAPHIC REPRESENTATION OF CONTROL CHARACTERS
Several methods are available for "printing" control characters. First,
there is the de facto standard collection of dingbats in the 0x00-0x1F range
of IBM PC Code Page 437 [14]. As shown in Table 4.1, this is already
adequately covered by Unicode (in which "Code" is the Unicode value and
"IBM" is the IBM Code page value, both hexadecimal).
Table 4.1: IBM PC Code Page 437 C0 Graphics
Code IBM Unicode Name Code IBM Unicode Name
00A0 00 Blank 25BA 10 Black right-pointing pointer
263A 01 White smiling face 25C4 11 Black left-pointing pointer
263B 02 Black smiling face 2195 12 Up down arrow
2665 03 Black heart suit 203C 13 Double exclamation mark
2666 04 Black diamond suit 00B6 14 Pilcrow sign
2663 05 Black club suit 00A7 15 Section sign
2660 06 Black space suit 25AC 16 Black rectangle
2022 07 Bullet 21A8 17 Up down arrow with base
25D8 08 Inverse bullet 2191 18 Upwards arrow
25EF 09 Large circle 2193 19 Downwards arrow
25D9 0A Inverse white circle 2192 1A Rightwards arrow
2642 0B Male sign 2190 1B Leftwards arrow
2640 0C Female sign 2319 1C Turned not sign
266A 0D Eighth note 2194 1D Left right arrow
266C 0E Beamed 16th notes 25B2 1E Black up-pointing triangle
263C 0F White sun with rays 25BC 1F Black down-pointing triangle
(Note that "black" and "white" are used in accordance Unicode terminology,
where they denote the presence or absence of (black) ink on the page;
however, any colors at all can appear on a terminal screen.)
More useful in a terminal emulator, however, is the ability to display the
the official abbreviation [1,18], or "name", of the control character in a
single cell, as is done by numerous terminals, as well as by data analyzers
and line monitors, which themselves also tend to be increasingly implemented
in software on PCs.
Some control characters have two-character abbreviations (such as CR, LF,
HT, FF), while others are three characters (NUL, SOH, DC1, DLE). Some
terminals compress three-letter abbreviations to the two-character forms
shown in Table 4.2. All terminals, however, display the abbreviations
diagonally in the character cell, as shown in Figure 4.1.
Figure 4.1: Control Picture Display
+---+ +---+
|L | |D | (except the two-character abbreviation appears on the
| | | C | screen with the characters closer together)
| F| | 1|
+---+ +---+
Unicode already has a block of Control Pictures at U+2400 through U+2421,
but (except for "NL" at U+2424) these go horizontally across the character
cell, rather than diagonally, thus making them difficult to distinguish from
normal alphanumeric text. A new, parallel block of C0 control pictures is
needed in which the abbreviations are displayed diagonally. These are
listed in Table 4.2, in which "Code" is the temporary Unicode value, "Name"
is the official (ASCII) abbreviation (and the one used in the Display
Controls character set of the VT220 family [5]), and "2X" is the 2-character
abbreviation (used in the Display Controls font of Televideo [22,23], HP [11],
Perkin Elmer [20], and other terminals).
Table 4.2: C0 Control Pictures
Code Name 2X Code Name 2X
E000 NUL NU E010 DLE DL
E001 SOH SH E011 DC1 D1
E002 STX SX E012 DC2 D2
E003 ETX EX E013 DC3 D3
E004 EOT ET E014 DC4 D4
E005 ENQ EQ E015 NAK NK
E006 ACK AK E016 SYN SY
E007 BEL BL E017 ETB EB
E009 BS BS E018 CAN CN
E009 HT HT E019 EM EM
E00A LF LF E01A SUB SU
E00B VT VT E01B ESC EC
E00C FF FF E01C FS FS
E00D CR CR E01D GS GS
E00E SO SO E01E RS RS
E00F SI SI E01F US US
There is little to gain by defining separate 2- and 3-character glyphs for
control characters that have 3-character names; therefore it is suggested
that the full abbreviation (from the Name column) be used, with the
characters arranged diagonally within each cell (rather than horizontally as
in the U+2400 block), and that the 2X column be ignored.
C1 Control characters are specified in ISO-6429 and used in the VT220
family of terminals [5] and the Wyse 370 [26], where they are represented
in the right half of the "display controls" font as shown in Table 4.3 (DEC
terminals use the full name, Wyse terminals use the 2X name). As with C0
controls, the "name" is displayed diagonally within the character cell.
Unicode presently includes no C1 control pictures.
Table 4.3: C1 Control Pictures
Code Name 2X Code Name 2X
80 (1) E030 DCS DC
81 (1) E031 PU1 P1
E022 BPH (2) E032 PU2 P2
E023 NBH (2) E033 STS SE
E024 IND IN (3) E034 CCH CC
E025 NEL NL E035 MW MW
E026 SSA SS E036 SPA SP
E027 ESA ES E037 EPA EP
E028 HTS HS E038 SOS (2)
E029 HTJ HJ 99 (1)
E02A VTS VS E03A SCI (2)
E02B PLD PD E03B CSI CS
E02C PLU PU E03C ST ST
E02D RI RI E03D OSC OS
E02E SS2 S2 E03E PM PM
E02F SS3 S3 E03F APC AP
Notes;
(1) Undefined in ISO-6428, shown on VT220/WY370 terminal by hex value.
(2) Defined in ISO-6428, but shown on VT220/WY370 terminal by hex value.
(3) Undefined in ISO-6428, but shown indicated on VT220/WY370 terminal.
Note that three of the C1 control pictures are unassigned (the ones marked
by "(1)", that would be at U+E020, U+E021, and U+E039 if these were
assigned). These positions should be left vacant in case names are assigned
to these characters in a future revision of ISO 6429.
As with C0 controls, it is presumed acceptable to encode the full
abbreviation, without the 2-character alternatives for 3-character forms.
Table 4.4 shows the names of control characters unique to EBCDIC (that is,
the ones it does not share with ASCII).
Table 4.4: EBCDIC Control Pictures
Code Name Description
E040 PF Punch Off
E041 PN Punch On
E042 LC Lower Case
E043 UC Upper Case
E044 SMM Start of Manual Message
E045 TM Tape Mark
E046 RES Restore
E047 IL Idle
E048 CC Cursor Control
E049 CU1 Customer Use 1
E04A CU2 Customer Use 2
E04B CU3 Customer Use 3
E04C CU4 Customer Use 4
E04D IFS Interchange File Separator
E04E IGS Interchange Group Separator
E04F IUS Interchange Unit Separator
E050 DS Digit Select
E051 SOS Start of Significance
E051 BYP Bypass
E053 SM Set Mode
Names for IBM 3270 terminal Orders, LU 1 SCS Control Codes, and Format
Control Orders, which are not already listed as ASCII or EBCDIC control
codes, are shown in Table 4.5, to be used in debugging 3270 data streams.
Table 4.5: 3270 Control Pictures
Code Name Description
E060 VCS Vertical Channel Select
E061 GE Graphics Escape
E062 ENP Enable Presentation
E063 IRS Interchange Record Separator
E064 INP Inhibit Presentation
E065 SA Set Attribute
E066 FMT Format
E067 TRN Transparent
E068 SF Start Field
E069 SFE Start Field Extended
E06A SBA Set Buffer Address
E06B MF Modify Field
E06C PT Program Tab
E06D RA Repeat to Address
E06E EUA Erase to Unprotected Address
E06F DUP Duplicate
E070 FM Field Mark
E071 EO Eight Ones
Table 4.6 shows additional characters that may be included in "display
controls" mode on various terminals.
Table 4.6: Additional Control-Like Pictures
Code Name Remarks
E080 SP Space (like U+2420 but arranged diagonally)
E081 DEL Delete (Rubout) (2-character name: DT)
E082 LS1 Locking Shift 1 (ISO name for SO)
E083 LS0 Locking Shift 0 (ISO name for SI)
E084 IS4 ISO Name for FS: Information Separator 4
E085 IS3 ISO Name for GS: Information Separator 3
E086 IS2 ISO Name for RS: Information Separator 2
E087 IS1 ISO Name for US: Information Separator 1
E088 CL Clear or Cancel Line (used on HP terminals)
E089 BP From the Data General Word Processing Set
E08A BE From the Data General Word Processing Set
E08B FN From the Data General Word Processing Set
E08C FE From the Data General Word Processing Set
E08D HF From the Data General Word Processing Set
E08E Diagonal crosshatches (1)
E08F Picture of Bell (used on HP-2621 to show BEL, 0x07)
2422 Blank symbol (substitute blank, b with stroke) (2)
2423 Blank symbol (open box) (2)
2424 NL DEC Special Graphics 0x68, EBCDIC control New Line (2)
Notes:
(1) Used for DEL on Televideo, HP. Similar to U+25A9, but without border.
(2) Already in Unicode.
Summary:
115 new characters required for graphic representation of
control characters. Range: U+E000 through U+E09F, 160 positions with
45 vacant for expansion.
5. HEX BYTES
Hexadecimal byte values, 2 hex digits each. Like display controls, but for
all 256 8-bit byte values, showing the byte code in hexadecimal, rather than
the (context-dependent) name. For hex debugging (in terminal emulators,
line monitors, protocol analyzers, etc). Should be arranged diagonally
within the character cell as shown in Figure 5.1:
Figure 5.1: Hex Byte Pictures
+--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+
|0 | |0 | |0 | ... |0 | |1 | |1 | |1 | ... |E | |F | ... |F |
| 1| | 2| | 3| | F| | 0| | 1| | 2| | F| | 0| | F|
+--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+
One glyph is required for each hex byte code 00 through FF, or 256 glyphs
in all. Suggested temporary codes: U+E100 through U+E1FF.
Note that the SNI "IBM" character set contains glyphs for 01 through 1F,
which are shown sideways. I see no reason to encode these separately, but
others might disagree.
Summary: 256 new characters, U+E100 through U+E1FF.
6. MATH SYMBOLS
Unicode has a generous supply of math symbols, and no doubt more are in the
works. And of course it also includes the Latin, Greek, Fraktur, Hebrew,
and other letters used in mathematical notation.
However, terminal emulators also need special glyphs designed to be joined
together in adjacent character cells, vertically or horizontally, to form
large math symbols such as integrals, summation signs, braces, or brackets,
such as the integral top and bottom that already exist at U+2320 and U+2321.
Several other single-cell characters are also missing, including the small
radical sign from the DEC Technical character set. Table 6.1 lists the
needed characters, along with suggested temporary codes for them. At least
one real terminal reference is shown for each character, in column/row
notation, or an IBM Graphic Character Global Identifier (GCGID) [14]. Note:
SB stands for Square Bracket.
Table 6.1: Supplementary Math Symbols
Code Description Reference
E0A0 Extensible left brace middle DEC Tech 02/15
E0A1 Extensible left parenthesis bottom DEC Tech 02/12, IBM SS210000
E0A2 Extensible left parenthesis top DEC Tech 02/11, IBM SS200000
E0A3 Extensible left SB bottom DEC Tech 02/08
E0A4 Extensible left SB top DEC Tech 02/07
E0A5 Extensible right brace middle DEC Tech 03/00
E0A6 Extensible UR or LL brace section IBM SS240000
E0A7 Extensible LR or UL brace section IBM SS250000
E0A8 Extensible right parenthesis bottom DEC Tech 02/14, IBM SS230000
E0A9 Extensible right parenthesis top DEC Tech 02/13, IBM SS220000
E0AA Extensible right SB bottom DEC Tech 02/10
E0AB Extensible right SB top DEC Tech 02/08
E0AC Summation symbol bottom DEC Tech 03/02, DG Math 01/09(1)
E0AD Summation symbol top DEC Tech 03/01, DG Math 01/08(1)
E0AE Right ceiling corner DEC Tech 03/05
E0AF Right floor corner DEC Tech 03/06
E0B0 Radical symbol, small DEC Tech 00/01
E0B1 Radical symbol with stroke DG Math 01/13
E0B2 Superscript Latin small letter i SNI Math 03/00
E0B3 Latin small letter a with underbar SNI Math 04/04 (2)
E0B4 Latin capital letter H with bar SNI Math 04/05 (2)
E0B5 Latin small letter h with bar SNI Math 04/06 (2)
E0B6 Latin capital letter L with dot SNI Math 04/07 (2)
E0B7 Latin small letter L with dot SNI Math 04/08 (2)
E0B8 Latin capital letter O with underbar SNI Math 04/09 (2)
E0B9 Latin small letter t with bar SNI Math 04/10 (2)
E0BA Latin small script letter t with bar SNI Math 04/12 (2)
E0BB ??? SNI Math 04/11 (3)
E0BC ??? SNI Math 04/11 (3)
E0BD ??? SNI Math 04/11 (3)
E0BE Superscript almost-equal-to sign SNI IBM 06/12
E0BF Superscript capital Greek letterSigma SNI IBM 06/13
E0C0 Superscript infinity sign SNI IBM 07/12
E0C1 Superscript proportional-to sign SNI IBM 07/13
References:
DEC Tech = Digital Equipment Corporation Technical Character Set [5]
SNI Math = Siemens Nixdorf Mathematisch [21]
DG Math = Data General Word-Processing, Greek, and Math Character Set [2]
IBM = IBM Graphic Character Global Identifier (GCGID) [14]
Notes:
(1) Also GCGID SS280000 and SS29000.
(2) I'm not too sure about some of the SNI symbols. I'm only guessing at
what the pictures (in the SNI 97801 manual) are supposed to mean; there
are no accompanying character names or text.
(3) These look like permutations of lowercase Latin letter n with hook
(small eng), in various sizes, with or without a vertical accent mark
on top. It's not clear to me whether these can be unified with any
existing Unicode characters.
As far as I can tell, none of the SNI letterforms listed above are in
Unicode 2.0.
Summary: 34 new characters, Range E0A0-E0CF, with 14 positions left vacant.
7. LINE, BOX, AND BLOCK CHARACTERS
A particular need addressed by this proposal is the continued ability to
support (sometimes mission-critical) terminal-based forms-filling
applications that also require entry and display of international
characters, as terminals are replaced by PCs. So far, Unicode has provided
the international characters, but not necessarily all the needed
character-cell based forms-drawing capabilities.
Some terminals have vertical and horizontal lines that are not centered
within the character cell, and currently not found in Unicode. Others have
black rectangles or other shapes not found in the U+2580 block.
Abbreviations:
V = Vertical
H = Horizontal
L = Left
R = Right
LL = Lower Left
LR = Lower Right
UL = Upper Left
UR = Upper Right
Terminology:
Quadrant
A black rectangle filling one quarter of a cell, with one corner in the
center and the opposite corner at a corner of the cell. So "Quadrant UL"
is the upper left quadrant; "Quadrant UL and UR" is the top half of the
cell (which happens to be coincident with U+2580 and so is not included
here).
Line
Refers to a line that extends all the way to opposite edge(s) of a cell,
designed to be joined to (a) line(s) in the adjacent cell(s).
Bar
Refers to a horizontal line that does not touch any cell edges.
Wedge
Refers to a character cell with a diagonal line connecting opposite
corners, dividing it into two triangles; one black, the other white. Thus
an UL Wedge is similar to U+25E9, except it fills the entire character
cell.
Framus
(Pick a better word!) is a shape composed of two triangles with their
points meeting at the center of the cell to form an X with bars across the
top and bottom, closing the open ends. A black framus has the two
triangles filled in; a white one is in outline form. A framus with center
bar has a horizontal line through the center of the cell.
Figure 7.1: "Framus" Glyphs
White Black With Bar
******* ******* *******
* * ***** * *
* * *** * *
* * *********
* * *** * *
* * ***** * *
******* ******* *******
Table 7.1: Additional Line, Box, and Block Characters
Code Description References
E0D0 L V box line, extensible H19 07/12 (1)
E0D1 R V box line, extensible H19 07/13 (1)
E0D2 UL Wedge H19 07/02, IBM SF870000
E0D3 UR Wedge H19 05/14, IBM SF860000
E0D4 LL Wedge IBM SF850000
E0D5 LR Wedge IBM SF840000
E0D6 H line - Scan 1 DSG 06/15, H19 07/10, WG3 05/00, TVI 09/00
E0D7 H line - Scan 3 DSG 07/00, Wyse ANSI 01/01, WG3 05/00
H line - Scan 5 DSG 07/01, Wyse ANSI 02/02 (2)
E0D9 H line - Scan 7 DSG 07/02, Wyse ANSI 01/03, WG3 05/01
E0DA H line - Scan 9 DSG 07/03, H19 07/11, WG3 05/01, TVI 09/01
E0DB Quadrant LL H19 06/13, WG3 05/05, TVI 09/05
E0DC Quadrant LR H19 06/12, WG3 05/04, TVI 09/04
E0DD Quadrant UL H19 06/14, WG3 05/06, TVI 09/06
E0DE Quadrant UL and LL and LR WG3 05/11, TVI 09/11
E0DF Quadrant UL and LR H19 06/10 (3)
E0E0 Quadrant UL and UR and LL WG3 05/12, TVI 09/12
E0E1 Quadrant UL and UR and LR WG3 05/13, TVI 09/13
E0E2 Quadrant UR H19 111, WG3 83, TVI 09/03
E0E3 Quadrant UR and LL (for completeness)
E0E4 Quadrant UR and LL and LR WG3 05/14, TVI 09/14
E0E5 Full black diamond TVI 09/02 (4)
E0E6 Black framus DGM 06/08
E0E7 Black framus + H center bar DGM 06/09
E0E8 White framus DGM 06/10
E0E9 White framus + H center bar DGM 06/11
E0EA R & L arrow to V center bar DGM 03/13
E0EB Up arrow to H center line DGL 02/12
E0EC R arrow to V center line DGL 02/13
E0ED L arrow to V center line DGL 02/14
E0EE Down arrow to H center line DGL 02/12
E0EF Box drawing double dash H DGL 03/12 (5)
References:
DGM = Data General Word-Processing, Greek, and Math Character Set [2]
DGL = Data General Line Drawing Character Set [2]
DSG = The DEC Special Graphics Character Set [5]
H19 = The Heath/Zenith 19 Graphics Character Set [10]
WG3 = The Wyse Graphics 3 Character Set [25]
TVI = The Televideo 965 Multinational Character Set [23]
IBM = Graphic Character Global Identifier (GCGID) [14]
Wyse ANSI = Wyse 60 "Standard ANSI", "UK ANSI", and "ANSI Graphics" [25]
Notes:
(1) The vertical box lines are near, but not touching, the left and right
edges of the cell, respectively, and are two pixels thick on the H19
screen. Similar to IBM GCID SF640000 and SF650000, respectively.
(2) The center horizontal scan line is already in Unicode at U+2500.
(3) Only on Zenith models, not original Heathkits.
(4) Full black diamond, with points touching center of each cell wall.
(5) Similar to U+2504 but double rather than triple.
Also note that Quadrants UL+UR, UR+LR, LL+LR, UL+LL (half blocks) are
already encoded at block U+2580.
Summary:
31 New glyphs, Range E0D0 to E0EF, one vacancy.
8. MISCELLANEOUS SINGLE-CELL GLYPHS
Table 8.1: Miscellaneous Single-Cell Terminal Glyphs
Code Description Reference
E0F0 Reverse Question Mark DEC VTxxx, Wyse, Televideo (1)
E0F1 Box with X inside DG Math 06/07, GCGID SP500000
E0F2 Human stick figure with hat SNI Facet 04/14
E0F3 Clock (with hands at 3:00) SNI Klammern 05/01
E0F4 Overscore asterisk IBM 3270
E0F5 Overscore semicolon IBM 3270
E0F6 Padlock (keyboard locked) IBM 3270
Notes:
(1) The reverse question is essential in VT terminal emulation, where it
indicates that an invalid code was received, or a parity or other
error was detected. It also stands for SUB and/or RS in Wyse display
controls mode, and is the glyph for 0xFF in the Televideo Multinational
Character Set [23]. And it it is also a glyph in the DG Special
Graphics Character Set [2].
Summary:
7 New glyphs, Range E0F0 to E0FF, 9 vacant.
9. UNFINISHED BUSINESS
The selection of characters presented in this draft is far from
comprehensive. Hundreds of other terminals from the past 30+ years are
likely to have glyphs or entire character sets covered neither here nor
in Unicode, and these might or might not be important in some application
somewhere. Readers are invited, therefore, to propose any needed
additions, bearing in mind that Unicode code space is not unlimited.
No attempt was made to account for the many Viewdata, Videotex, Minitel,
NAPLPS, or other mosaic graphics character sets. These should be tackled,
if appropriate, by someone who knows something about them.
Several character sets found in the references consulted are ignored here,
fully or in part, due to lack of motivation (nobody has ever asked us to
support them). Obviously these, and any other missing sets, can be
considered if there is a demand.
Siemens Nixdorf Facet
A set of 95 mosaic graphics, but not resembling any of the ISO Videotex
mosaic sets; difficult to describe.
Siemens Nixdorf Klammern
A set of 95 assorted blobs, bracket and brace pieces, clocks, arrows,
hourglasses, and Greek letters, some of which are unique; others can be
unified with existing Unicode characters or characters in this proposal.
Hewlett Packard Line Drawing
Mostly coincident with Unicode box-drawing set at U+2500, but with a
handful of unique characters, such as single-to-triple box intersections,
single-to-double intersections with wide spacing, etc. These should be
mappable to existing U+25xx glyphs without causing riots in the streets.
Hewlett Packard Big Character Pieces
Thick line segments for drawing large characters, used on the HP-2648.
And no doubt many more...
10. SUMMARY OF PROPOSED ADDITIONAL CHARACTERS
If all the proposed new characters are added to the UCS, this will enable
terminal emulators to fully handle at least the following terminal character
sets, which were not previously covered in full:
ASCII/ISO Display Controls for DEC, Hewlett Packard, Televideo, and others.
EBCDIC Display Controls for the IBM 3270
Hexadecimal debugging
DEC Technical
DEC Special Graphics
Data General Word-Processing, Greek, and Math (1)
Data General Line Drawing
Heath/Zenith 19 Graphics
Hewlett Packard 2621 and HPTERM
Siemens Nixdorf's "IBM" set (plus parts of its Klammern and Facet sets)
Televideo Multinational
Wyse Graphics 3 (Graphics 1 and 2 were already covered)
Wyse "Standard ANSI", "UK ANSI", and "ANSI Graphics"
(1) Except the DG logo character, which is presumed off limits.
Terminals supporting these character sets are numerous indeed. An
incomplete list includes: DEC VT100, VT102, VT220/240, VT320/330/340,
VT420, VT520/525; Data General 210, 215, 217, 413, and 463; the Heath /
Zenith 19; and numerous Televideo and Wyse models.
Table 10.1 lists the new characters proposed in this document.
Table 10.1: Census of New Characters
Code Glyph Descripton
E000 NUL Diagonal Control Picture Null
E001 SOH Diagonal Control Picture Start of Heading
E002 STX Diagonal Control Picture Start of Text
E003 ETX Diagonal Control Picture End of Text
E004 EOT Diagonal Control Picture End of Transmission
E005 ENQ Diagonal Control Picture Enquiry
E006 ACK Diagonal Control Picture Acknowledge
E007 BEL Diagonal Control Picture Bell
E009 BS Diagonal Control Picture Backspace
E009 HT Diagonal Control Picture Horizontal Tab
E00A LF Diagonal Control Picture Line Feed
E00B VT Diagonal Control Picture Vertical Tab
E00C FF Diagonal Control Picture Form Feed
E00D CR Diagonal Control Picture Carriage Return
E00E SO Diagonal Control Picture Shift Out
E00F SI Diagonal Control Picture Shift In
E010 DLE Diagonal Control Picture Data Link Escape
E011 DC1 Diagonal Control Picture Device Control 1
E012 DC2 Diagonal Control Picture Device Control 2
E013 DC3 Diagonal Control Picture Device Control 3
E014 DC4 Diagonal Control Picture Device Control 4
E015 NAK Diagonal Control Picture Negative Acknowledge
E016 SYN Diagonal Control Picture Synchronous Idle
E017 ETB Diagonal Control Picture End of Transmission Block
E018 CAN Diagonal Control Picture Cancel
E019 EM Diagonal Control Picture End of Medium
E01A SUB Diagonal Control Picture Substitute
E01B ESC Diagonal Control Picture Escape
E01C FS Diagonal Control Picture Field Separator
E01D GS Diagonal Control Picture Group Separator
E01E RS Diagonal Control Picture Record Separator
E01F US Diagonal Control Picture Unit Separator
E020 (vacant)
E021 (vacant)
E022 BPH Diagonal Control Picture Break Permitted Here
E023 NBH Diagonal Control Picture No Break Here
E024 IND Diagonal Control Picture Index
E025 NEL Diagonal Control Picture Next Line
E026 SSA Diagonal Control Picture Start Selected Area
E027 ESA Diagonal Control Picture End Selected Area
E028 HTS Diagonal Control Picture Character Tabulation Set
E029 HTJ Diagonal Control Picture Character Tabulation with Justification
E02A VTS Diagonal Control Picture Line Tabulation Set
E02B PLD Diagonal Control Picture Partial Line Forward
E02C PLU Diagonal Control Picture Partial Line Backward
E02D RI Diagonal Control Picture Reverse Line Feed
E02E SS2 Diagonal Control Picture Single Shift 2
E02F SS3 Diagonal Control Picture Single Shift 3
E030 DCS Diagonal Control Picture Device Control String
E031 PU1 Diagonal Control Picture Private Use 1
E032 PU2 Diagonal Control Picture Private Use 2
E033 STS Diagonal Control Picture Set Transmit State
E034 CCH Diagonal Control Picture Cancel Character
E035 MW Diagonal Control Picture Message Waiting
E036 SPA Diagonal Control Picture Start Protected (Guarded) Area
E037 EPA Diagonal Control Picture End Protected (Guarded) Area
E038 SOS Diagonal Control Picture Start of String
E039 (vacant)
E03A SCI Diagonal Control Picture Single Character Introducer
E03B CSI Diagonal Control Picture Control Sequence Introducer
E03C ST Diagonal Control Picture String Terminator
E03D OSC Diagonal Control Picture Operating System Command
E03E PM Diagonal Control Picture Privacy Message
E03F APC Diagonal Control Picture Application Program Command
E040 PF Diagonal Control Picture Punch Off
E041 PN Diagonal Control Picture Punch On
E042 LC Diagonal Control Picture Lower Case
E043 UC Diagonal Control Picture Upper Case
E044 SMM Diagonal Control Picture Start of Manual Message
E045 TM Diagonal Control Picture Tape Mark
E046 RES Diagonal Control Picture Restore
E047 IL Diagonal Control Picture Idle
E048 CC Diagonal Control Picture Cursor Control
E049 CU1 Diagonal Control Picture Customer Use 1
E04A CU2 Diagonal Control Picture Customer Use 2
E04B CU3 Diagonal Control Picture Customer Use 3
E04C CU4 Diagonal Control Picture Customer Use 4
E04D IFS Diagonal Control Picture Interchange File Separator
E04E IGS Diagonal Control Picture Interchange Group Separator
E04F IUS Diagonal Control Picture Interchange Unit Separator
E050 DS Diagonal Control Picture Digit Select
E051 SOS Diagonal Control Picture Start of Significance
E051 BYP Diagonal Control Picture Bypass
E053 SM Diagonal Control Picture Set Mode
E054 (vacant through E05F)
E060 VCS Vertical Channel Select
E061 GE Graphics Escape
E062 ENP Enable Presentation
E063 IRS Interchange Record Separator
E064 INP Inhibit Presentation
E065 SA Set Attribute
E066 FMT Format
E067 TRN Transparent
E068 SF Start Field
E069 SFE Start Field Extended
E06A SBA Set Buffer Address
E06B MF Modify Field
E06C PT Program Tab
E06D RA Repeat to Address
E06E EUA Erase to Unprotected Address
E06F DUP Duplicate
E070 FM Field Mark
E071 EO Eight Ones
E072 (vacant through E07F)
E080 SP Diagonal Control Picture Space
E081 DEL Diagonal Control Picture Delete
E082 LS1 Diagonal Control Picture Locking Shift 1
E083 LS0 Diagonal Control Picture Locking Shift 0
E084 IS4 Diagonal Control Picture Information Separator 4
E085 IS3 Diagonal Control Picture Information Separator 3
E086 IS2 Diagonal Control Picture Information Separator 2
E087 IS1 Diagonal Control Picture Information Separator 1
E088 CL Diagonal Control Picture Cancel Line
E089 BP Diagonal Control Picture DG Word Processing BP
E08A BE Diagonal Control Picture DG Word Processing BE
E08B FN Diagonal Control Picture DG Word Processing FN
E08C FE Diagonal Control Picture DG Word Processing FE
E08D HF Diagonal Control Picture DG Word Processing HF
E08E Diagonal crosshatches
E08F Picture of bell
E090 (vacant through E09F)
E0A0 Extensible left brace middle
E0A1 Extensible left parenthesis bottom
E0A2 Extensible left parenthesis top
E0A3 Extensible left SB bottom
E0A4 Extensible left SB top
E0A5 Extensible right brace middle
E0A6 Extensible UR or LL brace section
E0A7 Extensible LR or UL brace section
E0A8 Extensible right parenthesis bottom
E0A9 Extensible right parenthesis top
E0AA Extensible right SB bottom
E0AB Extensible right SB top
E0AC Summation symbol bottom
E0AD Summation symbol top
E0AE Right ceiling corner
E0AF Right floor corner
E0B0 Radical symbol, small
E0B1 Radical symbol with stroke
E0B2 Superscript Latin small letter i
E0B3 Latin small letter a with underbar
E0B4 Latin capital letter H with bar
E0B5 Latin small letter h with bar
E0B6 Latin capital letter L with dot
E0B7 Latin small letter L with dot
E0B8 Latin capital letter O with underbar
E0B9 Latin small letter t with bar
E0BA Latin small script letter t with bar
E0BB Eng-like letter
E0BC Eng-like letter, fatter
E0BD Eng-like letter with vertical stroke
E0BE Superscript almost-equal-to sign
E0BF Superscript capital Greek letterSigma
E0C0 Superscript infinity sign
E0C1 Superscript proportional-to sign
E0C2 (vacant through E0CF)
E0D0 L V box line, extensible
E0D1 R V box line, extensible
E0D2 UL Wedge
E0D3 UR Wedge
E0D4 LL Wedge
E0D5 LR Wedge
E0D6 H line - Scan 1
E0D7 H line - Scan 3
E0D8 (vacant)
E0D9 H line - Scan 7
E0DA H line - Scan 9
E0DB Quadrant LL
E0DC Quadrant LR
E0DD Quadrant UL
E0DE Quadrant UL and LL and LR
E0DF Quadrant UL and LR
E0E0 Quadrant UL and UR and LL
E0E1 Quadrant UL and UR and LR
E0E2 Quadrant UR
E0E3 Quadrant UR and LL
E0E4 Quadrant UR and LL and LR
E0E5 Full black diamond
E0E6 Black framus
E0E7 Black framus + H center bar
E0E8 White framus
E0E9 White framus + H center bar
E0EA R & L arrow to V center bar
E0EB Up arrow to H center line
E0EC R arrow to V center line
E0ED L arrow to V center line
E0EE Down arrow to H center line
E0EF Box drawing double dash H
E0F0 Reverse Question Mark
E0F1 Box with X inside
E0F2 Human stick figure with hat
E0F3 Clock at 3:00
E0F4 Overscore asterisk
E0F5 Overscore semicolon
E0F6 Padlock
E0F7 (vacant through E0FF)
E100 (through E1FF): Hex Bytes
Summary:
E000 through E1FF = 512 positions, 42 vacant.
Codes in Exxx block to be moved to a non-Private Use area.
11. REFERENCES
[1] American National Standards Institute, ANSI X3.4-1986, Code for
Information Interchange (ASCII), 1986.
[2] Data General, Programming the Display Terminal: Models D217, D413, and
D463, Westboro, MA, 1991.
[3] Digital Equipment Corporation, VT100 User Guide, EK-VT100-UG-002,
Maynard, MA, 1979.
[4] Digital Equipment Corporation, VT100 Video Terminal User Guide,
EK-VT102-UG-003, Maynard, MA, 1982.
[5] Digital Equipment Corporation, VT220 Owner's Manual, EK-VT220-UG-003,
Maynard, MA, 1984.
[6] Digital Equipment Corporation, VT220 Series Programmer Reference
Manual, EK-VT240-RM-002, Maynard, MA, 1984.
[7] Digital Equipment Corporation, VT330/VT340 Programmer Reference Manual,
Volume 1: Text Programming, ED-VT3XX-TP-002, Maynard, MA, 1988.
[8] Digital Equipment Corporation, Installing and Using the VT420 Video
Terminal EK-VT420-UG.002, Maynard, MA, 1988.
[9] Digital Equipment Corporation, VT520/VT525 Video Terminal Programmer
Inforamtion, EK-VT520-RM.A01, Maynard, MA, 1994.
[10] Heathkit Manual for the Video Terminal Model H19, The Heath Company,
Benton Harbor, MI, 1979.
[11] Hewlett Packard 2621A/P Interactive Terminal Owner's Manual, 1978.
[12] Hewlett Packard 2648A Graphics Terminal Reference Manual, 1977.
[13] IBM System/360 Principles of Operation, GA22-6821-8, Poughkeepsie,
NY, 1970.
[14] IBM National Language Design Guide, Volume 2: National Language
Support Reference Manual, 4th Edition, North York, ON, 1994.
[15] IBM 3270 Information Display System, Data Stream Programmer's
Reference, GA23-0059-06, 1991.
[16] IBM 3164 ASCII Color Display Station Description, GA18-2317-1, 1986.
[17] ISO International Standard 2022, Information processing -- ISO
7-bit and 8-bit coded character sets -- Code extension techniques,
Third Edition, Geneva, 1986.
[18] ISO/IEC International Standard 6429, Information technology --
Control functions for coded character sets, Third Edition, Geneva, 1992.
[19] ISO/IEC 10646-1, International Standard 10646,
Information Processing -- Multiple-Octet Coded Character Set,
1993-now.
[20] Perkin Elmer Model 1100 User's Manual, Randolph, NJ, 1978.
[21] Siemens Nixdorf, Bildschirmeinheit 97801-5xx Schnittstellen,
Benutzerhandbuch, M|nchen, 1991.
[22] Televideo 922 Video Terminal Display Operator's Manual, Sunnyvale, CA,
1984.
[23] Televideo 965 Video Terminal Display Operator's Manual, Sunnyvale, CA,
1988.
[24] The Unicode Standard, Version 2.0, Addison-Wesley Developers
Press, 1996.
[25] Wyse WY-60 Programmer's Guide, Wyse Technology, San Jose, CA, 1987.
[26] Wyse WY-370 Programmer's Guide, Wyse Technology, San Jose, CA, 1990.
(End)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT