RE: Naming of functional ASCII characters in Unicode

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sun Jun 11 2000 - 00:36:34 EDT


At 02:53 PM 6/7/00 -0800, Yves Arrouye wrote:
>we have normative names
>that may look like they need to be fixed, and then alternative names that,
>as has been suggested, could be accepted to by a given Unicode
>implementation. But an application that wants to be safe and understood
>unambiguously will stick to the normative names.

There's still a fundamental misunderstanding here somewhere about what the
'normative character names' in ISO/IEC 10646 and Unicode are all about.

Their normative purpose is to uniquely identify a character. The
*guidelines* for creating names suggest that common names be used for the
character name, but do not require that the identifying name must carry
every nuance of usage.

ISO/IEC 10646 provides very little additional information about a character
other than a suggestive name (whose main function is to be unique), a
suggestive picture (which is not normative) and a location in the code
space. A few characters are listed as combining (normative) or mirroring
(informative).

While this system, together with the working memory of the working group
putting together the standard, might be enough to identify characters
sufficiently to allow the standard to be created without duplicates and
maintained in the future, Unicode early on realized that substantially more
information is needed to allow *users* to decide which character code to
use for what purpose and software developers to correctly design the
algorithms that manipulate these characters according to the users'
expectations.

In the Unicode Standard you will find a long list of properties that are
defined for all characters of the standard, and many more properties that
are defined for only those characters where they matter. Many of these
properties are even considered normative in Unicode. In addition, there are
the aliases, cross references, and other annotations that help people not
only correctly identify a character, but to get a pretty good idea about
its intended (or supported) range of uses.

To retreat to the 10646 name as the only information (other than suggestive
shape) about a character therefore means replacing publicly available, well
maintained and precise information about that character by implicit
reference to the unstated (except possibly in its historic working
documents) intentions of the working group.

In most situations, end-user selection of an appropriate character is
mediated by an input method (in the widest sense) specific to a given
language or notational system. For complicated or highly powerful cases,
designers have many ways of guiding the ultimate selection by the user,
other than having to rely on the somewhat artifical ISO/IEC 10646 character
names.

For developer (or power-user) oriented utilities, the best approach is to
make *all* relevant information about the character available.

A./



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT