Re: Glyphs of new Unicode 3.0 symbols

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Wed Nov 25 1998 - 15:54:43 EST


Edward Cherlin <cherlin@2cowherd.com> wrote:

> I have had to use transliterations (more precisely, Romanizations for
> English, and in some cases French or German speakers) of Hebrew, Greek,
> Chinese, Japanese, Korean, and Russian for one reason or another. This is
> the voice of experience telling you:
>
> NO WAY.
>
> The world community does not accept transliterations, romanizations, or
> ASCIIizations designed by and for English speakers without a fight, and
> they are quite right not to do so.
> ...
> Bottom line: If you can create an ASCIIization of French that is acceptable
> to the French, you can come back and discuss this idea with us again.
>
Yet among computer users, it was an ASCII (or EBCDIC) world until recently,
and people can be surprisingly resourceful in the face of such limitations.

I am aware of several character sets that are interesting in this regard.
Each one is a 7-bit character set (like ASCII), but which has the lowercase
letters replaced by letters of some other alphabet, such as Cyrillic, Greek,
or Hebrew.

The Cyrillic example is especially interesting, in that it is designed to
be used interchangeably on ASCII and Cyrillic terminals. The characters in
columns 2-5 have their normal ASCII values, but the characters in columns
6-7 are replaced by Cyrillic letters, matched up to the corresponding Roman
letters in columns 4-5 "by sound" (since there are more Cyrillic letters
than Roman, we also have accent grave, braces, vertical bar, and tilde
standing for Cyrillic letters that are not easily transcribable to ABCs,
like SHA, SHCHA, CHE, etc). This character set was widely used in the
Soviet Union prior to the GUI era, especially in e-mail; if I received email
in Short KOI on a regular ASCII terminal, the uppercase Roman letters would
be English (or other Roman-letter language) and the lowercase Roman letters
would be Cyrillic -- I could read them "phonetically" and out would come the
Russian words. On the other hand, if I received it on a Short-KOI terminal,
the characters from columns 6 and 7 would appear in their true Cyrillic form
(uppercase). Thus the Russian text would be readable either way.

For untold numbers of people, this was a de facto transcription of Cyrillic to
ASCII. Although I don't know that it was ever a GOST standard, I do know
about (and have a copy of) at least one Soviet publication that documents this
character set, and saw it in common use at several Soviet institutes in the
1980s.

The Hebrew (DEC VT100 Hebrew) and Greek (ELOT 927) counterparts are organized
the same way, except the Hebrew and Greek letters are not arranged "by sound",
but rather in their regular order. Still, I wonder if a similar system of
transliteration ever came about based on these sets (e.g. a = Alpha, b = Beta,
c = Gamma, ...).

I'd be interested to hear (offline, since this kind of tangential discussion
irritates many people) relevant stories, or of other such schemes. And also
of any tricks used to transcribe Roman-alphabet texts in different languages
to ASCII, such as the ones used in German for Umlauts. I am sure that
computer manuals from the 1970s and 80s from countries like Portugal, Norway,
Italy, Denmark, etc, contain instructions for entering native-language text on
ASCII-only terminals: "just enter the base letter without accents", "put the
accent after (before) the base letter", "write ae digraph as a and e", "write
thorn as th", etc. These are all no doubt offensive to the purist, but
nevertheless of interest historically, and even today in certain environments.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT