Re: Naming of functional ASCII characters in Unicode

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Tue Jun 06 2000 - 05:01:14 EDT


Bernd Warken wrote:
>
> The Unicode ASCII range U+00-7F still shows elements of the out-dated
> glyph approach instead of the intended character abstraction.

Probably, but I believe that backward compatibility greatly overfits
the intellectual construction.

 
> Historically, the 7-bits ASCII characters were used for databases and
> programming languages.

If "databases" includes any form of Latin text, yes.

> In later years, text processing required better representations for
> some of these functional characters.

Yes (we always need "better" in computer technology).

> This led to extensions like the well-known code-pages, ISO character
> standards, and Unicode.

Uh ?

I fail to see the connection between IBM codepages, ISO character sets
standards (to which Unicode can be taken as a variation), which aim
at extending the repertoire used and provide computer ways to use these
extended repertoires, and the requirements for better representations
of Latin text encoded as ASCII?

And if a connection exists, it is really tiny.

 
> So the primary task of the ASCII-7 code is programming, not text
> processing.

Where on earth did you prove that? You stated two sentences ago
that ASCII was used for databases and programming. Now you slip
and completely forget the database way, and even seems to oppose
database and text processing... (if text stored in databases,
as opposed to numbers, are not intended for text processing,
for what are they intended?)

Also, the most widely available programming languages, such as
Fortran, Pascal, C or Java, carefully avoid to refer themselves
to ASCII for the encoding of the sources (using different
techniques). Ada83 did that, but this was seen as a defect,
corrected in Ada95.

> Unfortunately, some names (and glyphs) do not reflect this functional
> meaning.

You will hit a dead horse. This problem is quite known.
ISO 10646 solution for it has been to modify the standard, and
to prefer the notation U+xxxx to represent the name of the
character, instead of the English name (also, ISO 10464 have
to cater with these pesky Frenchies that insist on not using
English everywhere ;-)). I do not know precisely what is Unicode
position on this topic, but if you feel uncomfortable with these
names, please use the U+00XX form, they are completely unambiguous.

And no, the names will not been changed.

 
> This might not seem a big problem today, but there are some long-term
> considerations in some interpreter languages to include wide characters
> in writing program code. At this point, the difference between
> functional characters and printable representations will become crucial.

Please elaborate on the problem you see.

I cater with these English names that looks like completely foreign
when used along with French text, and see no more problem than when
I use / to mean "such as", or an epsilon-like to mean "pertains to"
in mathematical context, or when I use theta to abbreviate "-tion"
when I take notes. These are just conventions, and as such should
been learned and taken as is (furthermore, it is quite clear that they
will not been modified, so time is not wasted is you learn them!)

 
> " U+0022 QUOTATION MARK
<snip>
> Copyleft 2000 by Bernd Warken <bwarken@mayn.de>

Since you mention it, by default, my text is copyrighted. To me.

Antoine



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT