L2/08-108
Source: Mark Davis
Subject: Pattern Syntax in UAX31
Date: 2008-02-07
Please add this to the doc registry and agenda:
1. In UAX31, Section 3
Alternative Identifier Syntax
http://www.unicode.org/reports/tr31/tr31-8.html#Alternative_Identifier_Syntax
we define a way to have immutable identifiers, and have the following rule:
R2 |
Alternative Identifiers |
|
To meet this
requirement, an implementation shall define identifiers to be any
string of characters that contains neither Pattern_White_Space nor
Pattern_Syntax characters.
Alternatively, it shall declare that it uses a
profile and define that profile with a precise list of
characters that are added to or removed from the sets of code points
defined by these properties. |
However, this would allow characters that are General_Category=Private_Use,
Surrogate, or Control. This was clearly not intended: Those are all
immutable, and should be added to the list of exceptions. That is, replace
the first paragraph by:
To meet this requirement, an implementation shall define
identifiers to be any non-empty string of characters that contains no
character with any of the following property values:
- Pattern_White_Space=True,
- Pattern_Syntax=True, or
- General_Category=Private_Use, Surrogate, or
Control.
2. We should also add a note that an implementation while the property
values Default_Ignorable_Code_Point=True and White_Space=True are not
immutable, in its profile an implementation should exclude characters with
those values in the current latest version of Unicode at the time that the
profile is adopted. (And explain why this is a good policy for the profile.)
This applies to both R2 and R3.
3. We should request of the officers to add a stability policy for the
General_Category values Private_Use, Surrogate, and Control, confirming that
they are immutable: they will neither grow nor shrink in future versions of
Unicode.
--
Mark