Feedback on Confusables or Identifier Restriction Data
UTS #39, Unicode Security Mechanisms
You can use one of the following two forms to suggest data additions or modifications for identifier characters in the next version of UTS #39,
Unicode Security Mechanisms. For the latest version, see UTS #39,
Unicode Security Mechanisms. A
proposed update may be available for review.
- Identifier Restrictions
- The identifier restrictions provide information about which characters should
be restricted in general-purpose identifiers, for security purposes. For example, historic scripts are a candidate for such exclusion. Removing these characters causes less confusion, and avoids the need to consider all of the possible confusable characters.
- Confusables
- Characters are confusable when they can be confused at normal font sizes in common UI fonts. While the Unicode code charts can be consulted for a representative glyph, what is more important are the glyphs that are in common UI fonts. Some of this data can be gathered mechanically, but there are many cases where human judgment is needed. On the Mac, the Character Viewer can be used to see glyphs in different fonts for a given character; for Windows and Linux, see Selection from a Screen.
- The focus of the current data for
UTS #39, Unicode Security Mechanisms is for “allowed” characters (in Restrictions). There is little data for other scripts, and the data is sparse for the Han script and scripts of South and Southest Asia, such as Devanagari. Suggestions for improvements for these and other scripts are welcome.
Data Formats
On the reporting forms Restrictions and Confusables, you can specify characters either with hex codes or literal characters. Where you specify a string of multiple characters, please separate them by spaces. For example:
Example |
Comment |
〃 |
U+3003 DITTO MARK |
3003 |
U+3003 DITTO MARK |
a c |
The string "ac" |
0061 0063 |
The same string, with hex codes. |
For Restrictions, you can also specify a set of characters. That should use one of the following formats:
Example |
Comment |
a..c |
All of the characters between a and c, in Unicode order. |
0061..0063 |
The same range, using with hex codes. |
[:blk=Greek:]
|
An entire block, or other UnicodeSet. |
Bulk Data
If you have a large amount of data that you would like to submit, please request instructions using the Unicode Contact Form.