Unicode 1.0 names for control characters

From: DougEwell2@cs.com
Date: Tue Dec 04 2001 - 00:30:54 EST


I am surprised and puzzled by the "Unicode 1.0 Name" changes for some of the
ASCII and Latin-1 control characters that were introduced in the latest beta
version of the Unicode 3.2 data file (UnicodeData-3.2.0d5.txt):

U+0009 HORIZONTAL TABULATION ==> CHARACTER TABULATION
U+000B VERTICAL TABULATION ==> LINE TABULATION
U+001C FILE SEPARATOR ==> INFORMATION SEPARATOR FOUR
U+001D GROUP SEPARATOR ==> INFORMATION SEPARATOR THREE
U+001E RECORD SEPARATOR ==> INFORMATION SEPARATOR TWO
U+001F UNIT SEPARATOR ==> INFORMATION SEPARATOR ONE
U+008B PARTIAL LINE DOWN ==> PARTIAL LINE FORWARD
U+008C PARTIAL LINE UP ==> PARTIAL LINE BACKWARD

Were these "new" names (e.g. CHARACTER TABULATION) really the original
Unicode 1.0 names? I don't have my 1.0 book close at hand, but I know that
they were *not* the names used in 1.1, according to the file "namesall.lst"
from that version. (Aha, didn't think anyone still had that dusty old thing
lying around?)

IMHO, the new names CHARACTER TABULATION and LINE TABULATION are much less
intuitive than HORIZONTAL TABULATION and VERTICAL TABULATION. Sometimes you
even see the abbrevations HT and VT for these two characters. The new names
appear to have been invented by someone who imagined a lack of clarity in the
old names.

I have seen the names IS4, IS3, IS2, and IS1 before, but they do not convey
the same information as FS, GS, RS, and US. The latter names are more
specific.

The "old" names for these six control characters were used as far back as the
original 1963 version of ASCII, according to Mackenzie (pp. 245-247).

I don't know about the history of U+008B and U+008C, but again it seems
strange that the "Unicode 1.0 name" for these characters is being changed at
this late date.

I know this 1.0 name field is not subject to the same rule of "no changes,
ever" that applies to the regular Character Name field, but why should these
names be changed at all?

On this same topic, parenthesized abbreviations have been added to the 1.0
names for U+000A LIFE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE
RETURN (CR), and U+0085 NEXT LINE (NEL). Does the addition of these
abbreviations mean that they are now part of the official 1.0 name, and if
so, why? Other characters typically don't have abbreviations as part of
their names, even if they are as meaningful and as commonly used as these,
and again it is a change from the 1.0 name we have seen for a decade.

Perhaps I've been checking the beta files a bit TOO carefully.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Dec 04 2001 - 00:15:20 EST