RE: Naming of functional ASCII characters in Unicode

From: Marco.Cimarosti@icl.com
Date: Tue Jun 06 2000 - 05:26:23 EDT


Bernd Warken wrote:
> So the primary task of the ASCII-7 code is programming, not text
> processing. This makes the ASCII characters primarily functional.

This is wrong, as other people already noticed. The original main purpose of
ASCII (*American* Set of Characters for Information Interchange, or
something like that) was to encode texts in American English (not in
international English: note, e.g., the lack of the pound sterling sign).

You miss the fact that ASCII, just like the QWERTY keyboard, is a direct
descendant of the set of glyphs used on American typewriters.

This historical connection with typewriters and the fact that daisy-wheel
printers were the "stdio" of computers, are key facts to understand why some
characters are in ASCII, and why they are called those names.

> " U+0022 QUOTATION MARK
> The character " is a double quote - just look at it.

Well, it is indeed a "quotation mark": a mark used to indicate quotation.
Whether it is single, double, high, low, opening, closing, etc. are details
that may or may not be needed.

> ' U+0027 APOSTROPHE

This is primarily an apostrophe: a mark that is needed in English to spell
the genitive ending "'s" and to indicate contractions ("it's", "ain't",
etc.).

It probably had since the beginning other usages:

- Serve as an acute accent (preceded by the backspace control) for the
English words of foreign origin (e.g. "café").

- Serve as an alternate quotation mark, especially to mark nested quotation.

> It should be renamed according to its function, i.e., SINGLE QUOTE or
> RIGHT QUOTE. These names were used for decades, before Unicode
> changed it.

The habit of using U+0060 and U+0027 together as a couple of opening/closing
quotation marks is typically Unixish, and is rarely to be found elsewhere.
This usage depends to the fact that U+0060 typically looks like a reversed
apostrophe (or comma) in Unix fonts. But in other environments, U+0060
really looks like an accent, and a quotation like `ciao' looks ugly with the
fonts I have.

Moreover, unlike ASCII, Unicode is not only for English. So using "left" or
"right" for defining opening and closing punctuation would be terribly
left-to-right biased. In a right-to-left context (e.g., in an Arabic section
of text) U+0028 looks like ")" and U+0029 looks like "(".

> ` U+0060 GRAVE ACCENT
> ^ U+0059 CIRCUMFLEX ACCENT

These are indeed diacritical signs, used to occasionally write foreign
language words. They were to overlaid on the base letter using the backspace
control.

When video displays became the normal computers' output device, backspace
could not be used any more to overlay characters (it become in fact a sort
of "back delete function"). So, characters like ` ^ ~ and _ were orphaned of
any function. It was only years later when these "archaeological" characters
were adopted and revitalized in the syntax of programming languages and OS
shells.

> - U+002D HYPHEN-MINUS
> The name HYPHEN-MINUS is not suitable, for there is already a
> printable hyphen, a printable minus sign, and several dashes.

This character is in fact ambiguous: it serves as an hyphen, as a minus
operator, and for many other purposes (e.g. as a nose, in ":-)").
The Unicode name just accounts for this ambiguity: what's wrong with this?

> Copyleft 2000 by Bernd Warken <bwarken@mayn.de>

Enhanced version (a few humble opinions added) copyleft 2000 by Marco
Cimarosti. Please see the archives of the Unicode List (www.unicode.org) for
the unabridged version of the posting, and for discordant instructions on
how to encode the copyleft sign.

_ Marco
 



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT