Feedback on Confusables or Identifier Restriction Data
You can use one of the following two forms to suggest data additions or modifications for
identifier characters in the next version of
UTS #39, Unicode Security Mechanisms.
- Identifier Restrictions
- The identifier restrictions provide information about which characters should
be restricted in general-purpose identifiers, for security purposes. For example, historic scripts are a candidate for such exclusion. Removing these characters causes less confusion, and avoids the need to consider all of the possible confusable characters.
- Identifier Confusables
- Characters are confusable when they can be confused at normal font sizes in common UI fonts. While the Unicode code charts can be consulted for a representative glyph, what is more important are the glyphs that are in common UI fonts. Some of this data can be gathered mechanically, but there are many cases where human judgment is needed. On the Mac, the Character Viewer can be used to see glyphs in different fonts for a given character; for Windows and Linux, see Selection from a Screen.
- The focus of the current data for UTS #39, Unicode Security Mechanisms is for “allowed” characters (in Identifier Restrictions). There is little data for other scripts, and the data is sparse for the Han script and scripts of South and Southest Asia, such as Devanagari. Suggestions for improvements for these and other scripts are welcome.
Data Formats
On the reporting forms Identifier Restrictions
and Identifier Confusables, you can specify characters either with hex codes or literal characters. Where you specify a string of multiple characters, please separate them by spaces. For example:
| Example |
Comment |
| 〃 |
U+3003 DITTO MARK |
| 3003 |
U+3003 DITTO MARK |
| a c |
The string "ac" |
| 0061 0063 |
The same string, with hex codes. |
For Identifier Restrictions, you can also specify a set of characters. That should use one of the following formats:
| Example |
Comment |
| a..c |
All of the characters between a and c, in Unicode order. |
| 0061..0063 |
The same range, using with hex codes. |
| [:blk=Greek:] |
An entire block, or other
UnicodeSet. |
Bulk Data
If you have a large amount of data that you would like to submit, please request instructions using the Unicode Contact Form.