Why Nothing Ever Goes Away
lists+unicode at seantek.com
Tue Oct 6 07:24:06 CDT 2015
> 2. The Unicode code charts are (deliberately) vague about U+0080, U+0081,
> and U+0099. All other C1 control codes have aliases to the ISO 6429
> set of control functions, but in ISO 6429, those three control codes
> have any assigned functions (or names).
On 10/5/2015 3:57 PM, Philippe Verdy wrote:
> Also the aliases for C1 controls were formally registered in 1983 only
> for the two ranges U+0084..U+0097 and U+009B..U+009F for ISO 6429.
If I may, I would appreciate another history lesson:
In ISO 2022 / 6429 land, it is apparent that the C1 controls are mainly
aliases for ESC 4/0 - 5/15. ( @ through _ ) This might vary depending on
what is loaded into the C1 register, but overall, it just seems like
saving one byte.
Why was C1 invented in the first place?
And, why did Unicode deem it necessary to replicate the C1 block at
0x80-0x9F, when all of the control characters (codes) were equally
reachable via ESC 4/0 - 5/15? I understand why it is desirable to align
U+0000 - U+007F with ASCII, and maybe even U+0000 - U+00FF with Latin-1
(ISO-8859-1). But maybe Windows-1252, MacRoman, and all the other
non-ISO-standardized 8-bit encodings got this much right: duplicating
control codes is basically a waste of very precious character code real
PS I was not able to turn up ISO 6429:1983, but I did find ECMA-48, 4th
Ed., December 1986, which has the following text:
5.4 Elements of the C1 Set
These control functions are represented:
- In a 7-bit code by 2-character escape sequences of the form ESC Fe,
where ESC is represented by bit combination 01/11 and Fe is represented
by a bit combination from 04/00 to 05/15.
- In an 8-bit code by bit combinations from 08/00 to 09/15.
This text is seemingly repeated in many analogous standards ca. ~1974 -
PPS I happen to have a copy of ANSI X3.41-1974 "American National
Standard Code Extension Techniques for Use with the 7-Bit Coded
Character Set of [ASCII]". The invention/existence of C1 goes back to
this time, as does the use of ESC Fe to invoke C1 characters in a 7-bit
code, and 0x80-0x9F to invoke C1 characters in an 8-bit code. (See, in
particular, Clauses 18.104.22.168 and 5.3.6). In particular, Clause 22.214.171.124
says: "The use of ESC Fe sequence in an 8-bit environment is contrary to
the intention of this standard but, should they occur, their meaning is
the same as in the 7-bit environment."
I can appreciate why it was desirable to "fold" C1 in an 8-bit
environment into a 7-bit environment with ESC Fe. (If, in fact, that was
the direction of standardization: invent a new thing and then devise a
coding to express the new thing in the old thing.) It is less obvious
why Unicode adopted C1, however, when the trend was to jettison the
94-character Tetris block assignments in favor of a wide-open field for
character assignment. Except for the trend in Unicode to "avoid
assigning characters when explicitly asked, unless someone implements
them without asking, and the implementation catches on, and then just
assign the whole lot of them, even when they overlap with existing
assignments, and then invent composite characters, which further
compound the possible overlapping combinations".
More information about the Unicode