Re: Terminal Graphics Proposal

From: Paul Keinanen (keinanen@sci.fi)
Date: Thu Oct 01 1998 - 12:41:19 EDT


Kevin Bracey said most that I was planning to say about this interesting
proposal, but here are some more observations.

Frank da Cruz <fdc@watsun.cc.columbia.edu> wrote:

>Table 4.2: C0 Control Pictures
>
> Code Name 2X Code Name 2X
> E000 NUL NU E010 DLE DL
> E001 SOH SH E011 DC1 D1
> E002 STX SX E012 DC2 D2
> E003 ETX EX E013 DC3 D3
> E004 EOT ET E014 DC4 D4
> E005 ENQ EQ E015 NAK NK
> E006 ACK AK E016 SYN SY
> E007 BEL BL E017 ETB EB
> E009 BS BS E018 CAN CN
> E009 HT HT E019 EM EM
> E00A LF LF E01A SUB SU
> E00B VT VT E01B ESC EC
> E00C FF FF E01C FS FS
> E00D CR CR E01D GS GS
> E00E SO SO E01E RS RS
> E00F SI SI E01F US US
>
>There is little to gain by defining separate 2- and 3-character glyphs for
>control characters that have 3-character names; therefore it is suggested
>that the full abbreviation (from the Name column) be used, with the
>characters arranged diagonally within each cell (rather than horizontally as
>in the U+2400 block), and that the 2X column be ignored.

As far as I know, the Unicode standard does not specify the writing
direction or actual representation of these characters. I would think that
the two or three character forms are just variations of the same glyph. To
me, it would make perfectly sense for readability point of view to use e.g.
AK (horisontally, diagonally or vertically spaced) for a very small font and
use ACK for larger fonts with more available pixels.

If all octet values (00 .. FF) are also going to be displayed, there might
be some ambiguity with some of the two letter codes, e.g. FF, D1, D2, D3,
D4, EB and EC, which should be noted in the actual font design.
 

>C1 Control characters are specified in ISO-6429 and used in the VT220
>family of terminals [5] and the Wyse 370 [26], where they are represented
>in the right half of the "display controls" font as shown in Table 4.3 (DEC
>terminals use the full name, Wyse terminals use the 2X name). As with C0
>controls, the "name" is displayed diagonally within the character cell.
>Unicode presently includes no C1 control pictures.

Looking through various EBCDIC code pages (e.g. IBM278, IBM880) and other
unnumbered sets it appears that these control codes are all also available
in EBCDIC, but of course at different positions (e.g. IND at 0x24). Some
references to these sets are "IBM NLS RM Vol2 SE09-8002-01, March 1990" and
"IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987".

Based on this observation, it is strange that the C0 control pictures are in
the Unicode standard, but not the C1 control pictures.

>Table 4.3: C1 Control Pictures

>Note that three of the C1 control pictures are unassigned (the ones marked
>by "(1)", that would be at U+E020, U+E021, and U+E039 if these were
>assigned). These positions should be left vacant in case names are assigned
>to these characters in a future revision of ISO 6429.

In ISO 8859-1 these are listed as

80 PADDING CHARACTER (PAD)
81 HIGH OCTET PRESET (HOP)

99 SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI)

>Table 4.4 shows the names of control characters unique to EBCDIC (that is,
>the ones it does not share with ASCII).

There seems to be different names for the same EBCDIC control characters and
some of these names are equivalent to the ASCII names. Just wondering what
should be done to these control pictures ? Some examples below.

> E082 LS1 Locking Shift 1 (ISO name for SO)
> E083 LS0 Locking Shift 0 (ISO name for SI)
> E084 IS4 ISO Name for FS: Information Separator 4
> E085 IS3 ISO Name for GS: Information Separator 3
> E086 IS2 ISO Name for RS: Information Separator 2
> E087 IS1 ISO Name for US: Information Separator 1

>5. HEX BYTES
>
>Hexadecimal byte values, 2 hex digits each. Like display controls, but for
>all 256 8-bit byte values, showing the byte code in hexadecimal, rather than
>the (context-dependent) name. For hex debugging (in terminal emulators,
>line monitors, protocol analyzers, etc). Should be arranged diagonally
>within the character cell as shown in Figure 5.1:

These would be very nice :-). Note the possible ambiguity with some two
character control pictures r.g. FF, EB etc. So special precautions should be
taken when designing the fonts.

>8. MISCELLANEOUS SINGLE-CELL GLYPHS

>Notes:
> (1) The reverse question is essential in VT terminal emulation, where it
> indicates that an invalid code was received, or a parity or other
> error was detected. It also stands for SUB and/or RS in Wyse display
> controls mode, and is the glyph for 0xFF in the Televideo Multinational
> Character Set [23]. And it it is also a glyph in the DG Special
> Graphics Character Set [2].

Even ISO-Latin1 contains the reverse question mark at 0xBF, so it is no need
to re-invent it.

>9. UNFINISHED BUSINESS

>No attempt was made to account for the many Viewdata, Videotex, Minitel,
>NAPLPS, or other mosaic graphics character sets. These should be tackled,
>if appropriate, by someone who knows something about them.

And not forgetting the tele-text block characters on European TVs. With the
introduction of TV cards for PCs that also contains a teletext decoder, so
there is a need to display the text and block graphics on PC. As far as I
remember, the block graphic format is more or less the same as Viewdata with
2 columns and 3 rows per character cell, thus requiring 64 glyphs.

All in all a very interesting proposal. By using as much existing characters
from current Unicode standard, i guess there would be a greater likelyhood
of getting thing officially approved.

Paul



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT