L2/08-108

Source: Mark Davis
Subject: Pattern Syntax in UAX31
Date: 2008-02-07

Please add this to the doc registry and agenda:

1. In UAX31, Section
3 Alternative Identifier Syntax

http://www.unicode.org/reports/tr31/tr31-8.html#Alternative_Identifier_Syntax

we define a way to have immutable identifiers, and have the following rule:

 

R2  Alternative Identifiers
 

To meet this requirement, an implementation shall define identifiers to be any string of characters that contains neither Pattern_White_Space nor Pattern_Syntax characters.

Alternatively, it shall declare that it uses a profile and define that profile with a precise list of characters that are added to or removed from the sets of code points defined by these properties.


However, this would allow characters that are General_Category=Private_Use, Surrogate, or Control. This was clearly not intended: Those are all immutable, and should be added to the list of exceptions. That is, replace the first paragraph by:

 
To meet this requirement, an implementation shall define identifiers to be any non-empty string of characters that contains no character with any of the following property values:
 

2. We should also add a note that an implementation while the property values Default_Ignorable_Code_Point=True and White_Space=True are not immutable, in its profile an implementation should exclude characters with those values in the current latest version of Unicode at the time that the profile is adopted. (And explain why this is a good policy for the profile.) This applies to both R2 and R3.


3. We should request of the officers to add a stability policy for the General_Category values Private_Use, Surrogate, and Control, confirming that they are immutable: they will neither grow nor shrink in future versions of Unicode.

--
Mark