Punctuation symbols

L2/12-146
Source: Mark Davis
Date: April 27, 2012
Subject: Punctuation symbols

The UTC received a question as to why certain characters such as # and @ were punctuation, when they seem more accurately characterized as symbols, and seemingly similar characters are classed as symbol, such as the section sign (§) and copyright sign (©). This came in late in the release cycle, and we didn't have time to consider the issue in depth, and collect public feedback before the release. So we temporized by noting in an FAQ (http://www.unicode.org/faq/punctuation_symbols.html) that the line is somewhat vague, and that people can override (to some extent).

The categorization makes a significant difference to implementations. For example, punctuation is commonly ignored in searching and collation (eg in CLDR or in IgnoreSP option in UCA); the difference is important in many other kinds of processing (symbols are commonly excluded from registered personal names, for example). While the line between punctuation and symbol is always somewhat fuzzy, we should ensure that these characters have the best GC values for normal implementations. On the other hand, we need to consider whether a change would cause any problems.

So, now that we have time, we should put out a PRI to collect feedback on whether to change any or all of the following characters to symbols, mentioning the reasons for doing so, and the countervailing stability argument, so that we can weigh the pros and cons of a change in the committee.

U+0023 ( # ) NUMBER SIGN
U+0026 ( & ) AMPERSAND

U+002D ( - ) HYPHEN-MINUS

U+0040 ( @ ) COMMERCIAL AT
U+0025 ( % ) PERCENT SIGN
U+2030 ( ‰ ) PER MILLE SIGN
U+2031 ( ‱ ) PER TEN THOUSAND SIGN
U+002A ( * ) ASTERISK
U+2020 ( † ) DAGGER
U+2021 ( ‡ ) DOUBLE DAGGER
U+203B ( ※ ) REFERENCE MARK