We need to make sure that
people understand that there are certain special ranges of Unicode code
points that have restrictions on their contents, for stability. While some
of these are captured in the WG2 Principles and Procedures (
http://www.dkuug.dk/JTC1/SC2/WG2/docs/n3452.pdf),
some are not or are unclear. Suggest adding new sections:
D.2.6. Reserved code points for BIDI.
D.2.7. Reserved code points for
Default Ignorable Code Points
In addition, if people have programs for checking other consistency issues
for new code points, like name collisions, we should encourage them to add
tests for these as well, to at least flag those cases.
Details1. There is a range allocated
for non-identifier characters -- nothing suitable for encoding in
identifiers are encoded there.
Properties: Pattern_Syntax
http://unicode.org/reports/tr31/#Default_Identifier_Syntax
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Pattern_Syntax:]
There is already a section in N3452 on this:
D.2.5 Reserved code
points for stability of identifiers.
Despite that, there is currently
one character that would otherwise qualify as an identifier character:
U+2E2F
( ⸯ ) VERTICAL TILDE, allocated in U5.1. This
particular instance is not a problem, but this should not be repeated.
2. There is a range allocated for Bidi characters - no Bidi-Left
characters [:bc=L:] are encoded there.
Ranges:
U+0590…U+08FF,
U+FB1D…U+FB4F,
U+00010800…U+00010FFF,
plus [:blk=Arabic
Presentation Forms A:][:blk=Arabic Presentation Forms B:]
http://unicode.org/reports/tr9/#Directional_Formatting_Codes
3. There is a range allocated for default ignorable characters - no other
characters are allocated there, and all new DI characters should be
there.
Ranges:U+2065…U+2069
U+FFF0…U+FFF8
U+E0000…U+E0FFF
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:di:]&[:cn:]]
Mark