L2/12-146
Source: Mark Davis
Date: April 27, 2012
Subject: Punctuation symbols
The UTC received a question as to
why certain characters such as # and @ were punctuation, when they
seem more accurately characterized as symbols, and
seemingly similar characters are classed as symbol, such as the
section sign (§) and copyright sign (©).
This came in late in the release cycle, and we didn't have time to
consider the issue in depth, and collect public feedback before the
release. So we temporized by noting in an FAQ (http://www.unicode.org/faq/punctuation_symbols.html) that
the line is somewhat vague, and that people can override (to some
extent).
The categorization makes a
significant difference to implementations. For example, punctuation
is commonly ignored in searching and collation (eg in CLDR or in
IgnoreSP option in UCA); the difference is important in many other
kinds of processing (symbols are commonly excluded from registered
personal names, for example). While the line between punctuation and
symbol is always somewhat fuzzy, we should ensure that these
characters have the best GC values for normal implementations. On
the other hand, we need to consider whether a change would cause any
problems.
So, now that we have time, we
should put out a PRI to collect feedback on whether to change any or
all of the following characters to symbols, mentioning the reasons
for doing so, and the countervailing stability argument, so that we
can weigh the pros and cons of a change in the committee.
U+0023 ( # ) NUMBER SIGN
U+0026 ( & ) AMPERSAND
U+002D (
- ) HYPHEN-MINUS
U+0040 ( @ ) COMMERCIAL AT
U+0025 ( % ) PERCENT SIGN
U+2030 ( ‰ ) PER MILLE SIGN
U+2031 ( ‱ ) PER TEN THOUSAND SIGN
U+002A ( * ) ASTERISK
U+2020 ( † ) DAGGER
U+2021 ( ‡ ) DOUBLE DAGGER
U+203B ( ※ ) REFERENCE MARK