Characters in early sets

From: Doug Ewell (dewell@compuserve.com)
Date: Fri Nov 17 2000 - 03:46:28 EST


I recently received my copy of Mackenzie's "Coded Character Sets" from
Amazon's out-of-print search team, and between that and the "UNIVAC
Memories" Web site at www.fourmilab.com, I have been pondering some of
the characters that appear in early character sets and how they would
map to Unicode.

I'm not talking about so-called "legacy" character sets like ISO 8859,
of course. I mean *really* old stuff like BCDIC, PTTC, and FIELDATA,
character sets from the days of 6-bit bytes. I admit my interest in
these mappings is purely recreational, but I can always defend myself
by pointing to Unicode's stated goal of providing round-trip mappings
to all character sets in existence around 1990.

Frank da Cruz recently told a story about the "lozenge" character that
appeared in some of these early sets. Nowadays it is often represented
by U+00A4 CURRENCY SIGN, but it appears to be more closely matched by
U+2311 SQUARE LOZENGE. Would anyone disagree?

Some early character sets have a Greek capital delta. The "obvious"
mapping is to U+0394 GREEK CAPITAL LETTER DELTA, but there is also
U+2206 INCREMENT, which shares the identical glyph. There is no
apparent intent here to represent either the Greek D sound or the
mathematical concept normally associated with "delta" or "increment";
it's just a symbol. Occasionally an inverted capital delta is seen;
that character can only be mapped to U+2207 NABLA. For that reason,
I am inclined to map the delta-as-symbol to U+2206 rather than U+0394.
Does this seem reasonable?

On pages 100 and 102 Mackenzie describes other graphics that were
intentionally designed into BCDIC to "have no intrinsic meaning" and
to "cause customers to be disinclined to use them in applications."
They are:

- lower-case 'b' with crossbar (U+2422)
- lower-case Greek gamma (U+03B3)
- vertical bar with two horizontal bars (proposed for U+29E7 in Unicode
  3.2)
- vertical bar with three horizontal bars (can't find this one)
- lozenge (U+2311 as described above)
- capital Greek delta (U+2206 as described above)
- check mark or square-root radical (U+221A)
- horizontal bar with three vertical bars (proposed for U+29FB in
  Unicode 3.2)

The mappings to U+29E7 and U+29FB don't seem quite right; the glyphs
match fairly well, but the Unicode characters have a specific, non-
symbolic meaning. And I still can't find a match for the vertical bar
with three horizontal bars. Does anyone else see it?

The fun continues with wonders like the small digits in the Stretch
(IBM 7030) character set. Lemme see, U+2080 through U+2089....

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT