Greetings--hi all, I'm a new poster. I read on the unicode.org website that a good way to gauge interest and get a proposal through the process is to gather feedback and comments here before investing the time in a formal proposal, so, here goes...
This posting is to propose the addition of C1 Control Pictures to Unicode. It is being proposed by me, Sean Leonard, with the advice and +1 of Frank da Cruz.
Many years ago (in 1998), Frank da Cruz proposed a large number of additional characters for terminal emulation and the like, which can be found on the web and in the mail list archive variously:
ADDITIONAL CONTROL PICTURES FOR UNICODE ftp://kermit.columbia.edu/kermit/ucsterminal/control.txt
TERMINAL GRAPHICS FOR UNICODE ftp://kermit.columbia.edu/kermit/ucsterminal/ucsterminal.txt
HEX BYTE PICTURES FOR UNICODE ftp://kermit.columbia.edu/kermit/ucsterminal/hex.txt
Subject lines (1998):
Terminal Graphics Proposal
Terminal Graphics Draft 2
The proposal I would like to make here is much more modest: this proposal is only for the inclusion of C1 Control Pictures into the Unicode Standard. Frank explained to me that his original mega-proposals were rejected. However, I looked through the "Archive of Notices of Non-Approval" and was unable to find an explicit rejection of his proposals. In any event, if one reads through the old e-mail threads from 1998, one will find that the C1 Control Pictures subset of the proposals received a (luke)warm welcome.
RATIONALE
The Unicode code points U+0000 through U+00FF share the equivalent values from the ASCII Standard, ISO 646, ISO 6429, and ISO 8859-1. In many contexts, it is desirable to display all of these code points/characters uniquely and unambiguously. C0 Control Pictures are currently encoded in the Unicode Standard at U+2400; that block currently covers the undisplayable code points at U+0000-U+0020 (plus a few extra alternatives/additions). However, the undisplayable characters in U+0080-U+00FF are left out.
There are several business cases in which C1 Control Pictures are useful:
1. Terminal emulators need them for debugging.
2. Data analyzers need them so they can have a unique character that when the graphics subsystem/text renderers render each character, is intended for display rather than for control effects.
3. Engineers can distinguish when communicating between the data without side-effects (i.e., control characters as pictures), and the data that invokes side-effects (i.e., control characters used as control characters).
4. There are use cases for historic or scholarly purposes, to encode and discuss these characters in text, as distinct from invoking their side-effects (and displaying nothing).
5. To display all values in U+0000 - U+00FF as distinct _characters_, rather than in hexadecimal representation (which makes deciphering the meaning of the codes for graphic characters in the ASCII (G0) & ISO 8859-1 (G1) range very difficult), in the same width and font as the rest of the graphic characters.
6. In support of 1-5, font designers can design fonts that support C1 Control Pictures and that map glyphs to Unicode code points uniformly and interchangeably (two key architectural goals of the Unicode Standard). Without C1 Control Pictures, it is infeasible to provide graphical representations of the C1 Control Characters. This is an asymmetry compared to the C0 Control Pictures block in Unicode, and thus should be remedied.
Quoting from the Unicode Standard 6.0.0, sec. 16.1:
There are 65 code points [C0, C1, delete] set aside...for compatibility with the C0 and C1 control codes defined in the ISO/IEC 2022 framework.
The Unicode Standard provides for the intact interchange of these code points, neither adding to nor subtracting from their semantics. ... [i]n the absence of specific application uses, they may be interpreted according to the control function semantics specified in ISO/IEC 6429:1992.
In accordance with this and other text in the Standard, it is not really possible to assign glyphs uniformly and interchangeably to the code points in U+0000-U+001F and U+0080-U+009F. Variation selectors (sec. 16.4), for example, "provide a mechanism for specifying a restriction on the set of glyphs that are used to represent a particular character [examples given of CJK ideographs and Mongolian letters]." Variation selectors and other Unicode-defined control code points are ill-suited to causing C1 values to be displayed, because C1 values have no "display representation" in and of themselves.
PROPOSED CHARACTERS WITH NOTES
C1 Control Pictures
Hex Name Symbol for...
80 PAD PADDING CHARACTER
Allegedly not in ISO 6429. (Need to check historical versions; other sources.)
81 HOP HIGH OCTET PRESET
Allegedly not in ISO 6429. (Need to check historical versions; other sources.)
82 BPH BREAK PERMITTED HERE
83 NBH NO BREAK HERE
84 IND INDEX
"Move the active position one line down, to eliminate ambiguity about the meaning of LF. Deprecated in 1988 and withdrawn in 1992 from ISO/IEC 6429 (1986 and 1991 respectively for ECMA-48)." (from Wikipedia)
85 NEL NEXT LINE
86 SSA START OF SELECTED AREA
87 ESA END OF SELECTED AREA
88 HTS CHARACTER TABULATION SET
89 HTJ CHARACTER TABULATION WITH JUSTIFICATION
8A VTS LINE TABULATION SET
8B PLD PARTIAL LINE DOWN
8C PLU PARTIAL LINE UP
8D RI REVERSE LINE FEED
8E SS2 SINGLE SHIFT TWO
8F SS3 SINGLE SHIFT THREE
90 DCS DEVICE CONTROL STRING
91 PU1 PRIVATE USE ONE
92 PU2 PRIVATE USE TWO
93 STS SET TRANSMIT STATE
94 CCH CANCEL CHARACTER
95 MW MESSAGE WAITING
96 SPA START OF GUARDED AREA
97 EPA END OF GUARDED AREA
98 SOS START OF STRING
99 SGCI SINGLE GRAPHIC CHARACTER INTRODUCER
Allegedly not in ISO 6429. (Need to check historical versions; other sources.)
9A SCI SINGLE CHARACTER INTRODUCER
9B CSI CONTROL SEQUENCE INTRODUCER
9C ST STRING TERMINATOR
9D OSC OPERATING SYSTEM COMMAND
9E PM PRIVACY MESSAGE
9F APC APPLICATION PROGRAM COMMAND
A0 NBSP NO-BREAK SPACE
Purpose is to show in distinction to SP (SPACE)
AD SHY SOFT HYPHEN
Show - with SHY above or around it, similar to Unicode Standard document for U+00AD
(SHY may be the most "controversial" character. See above for rationale--the objective is to provide visually distinct characters throughout the U+0000-U+00FF range. U+00AD is visually identical to the U+002D hyphen-minus; the only distinction is a "control" distinction, which is non-visual. Hence, the distinction should be made visually, with a distinct code point.)
UNICODE CODE POINT ASSIGNMENTS
Unicode code point assignments are not explicitly advocated for in this initial, informal proposal. While it would be nice to place these codes adjacent or in the U+2400 block, there are not enough free code points to shoehorn them all in.
MODIFICATIONS TO THE UNICODE STANDARD
It is proposed that section 15.6, Technical Symbols, be extended to discuss both C0 and C1 controls.
-Sean Leonard
SeanTek
Received on Sat Aug 13 2011 - 12:50:23 CDT
This archive was generated by hypermail 2.2.0 : Sat Aug 13 2011 - 12:50:24 CDT