L2/10-446R2
From: Mark Davis
Date: Nov 4, 2010
Subject: Proposal for enhancements to UTS#46
Proposal for enhancements to UTS#46 for next Unicode version.
1. We've gotten feedback that it would be useful to indicate whether a character is allowed in IDNA2008 or not, and whether a test case is. The proposal is to issue a 6.0.1 version of UTS #46 with an additional optional field in the data files http://www.unicode.org/Public/idna/latest/IdnaMappingTable.txt and http://www.unicode.org/Public/idna/latest/IdnaTest.txt.
We define an acronym "NV8" to indicate at least one code point in a string is DISALLOWED under all versions of IDNA2008 (at or after Unicode 5.2).
The optional fields with this acronym will appear in the following cases:
For the IdnaTest.txt file, the extra field appears if the toUnicode value (field 3) contains any character that is NV8. The NV8 field does not otherwise appear.
Example:
B; fass.de;
B; xn--53h; ☕; xn--53h ; NV8
For the IdnaMappingTable.txt file, the extra field appears:
Examples:
0030..0039 ; valid # 1.1 DIGIT ZERO..DIGIT NINE
00B6..00B7 ; valid ; NV8 # 1.1 PILCROW SIGN..MIDDLE DOT
2474 ; disallowed_STD3_mapped ; 0028 0031 0029 ; NV8 # 1.1 PARENTHESIZED DIGIT ONE
3260 ; mapped ; 1100 ; NV8 # 1.1 CIRCLED HANGUL KIYEOK
We will note that NV8 does not apply the BIDI, CONTEXTO, or CONTEXTJ tests, since those need to be applied to the complete context of a label.
2. It would be useful to generate a more comprehensive set of test cases, like we do for collation. It could include one sample for: