L2/08-352
From:
|
Mark Davis
|
To:
|
The UTC
|
Re:
|
Proposed Stability of Property and Property Value Aliases
|
Date:
|
2008-09-30
|
1. In practice, the UTC has been very careful to make sure that property names and aliases, and property value names and aliases) are stable, meaning that we keep old names in PropertyAliases or PropertyValueAliases. This is important for implementations that use property and property value aliases, such as a regex engine, so that aliases don't disappear or change to denoting a different property or property value with a new version of Unicode. That can be as bad for many implementations as the removal of a character from Unicode.
While that has been the effective practice, we do not have a formal policy for that. For example, the W3C group has recently communicated to us that "The schema working group is concerned that Unicode block names are not guaranteed to be stable and thus would like to include a specific version of Unicode as a normative reference. Our working group would prefer that they recognize future versions of Unicode so that new Unicode block names can be adopted as new blocks are assigned. In particular, the schema WG is concerned by the change of name from "Greek" to "Greek and Coptic" of the 0370..03FF block."
Of course, we do not want to see protocols, standards, or products be tied to an old version of Unicode. To this end, I propose that the UTC request the officers to add a stability policy for property and property value aliases, along the following lines:
Property and Property Value Stability
Applicable Version: Unicode 5.1+
Non-contributory normative and informative properties and their property values may be deprecated, but are never removed.
All property aliases constitute a single namespace. For each property,
all of the property value aliases in PropertyValueAliases.txt constitute a separate namespace, one per
property. Within each such namespace:
- property and property value aliases are guaranteed to be unambiguous: no aliases within the namespace will collide under the property matching rules specified in the UCD.
- non-contributory normative and informative property and property value aliases are permanent: they will never be removed.
These guarantees make it possible to use these property aliases and property value aliases as stable identifiers (see for example UTS#18).
Thus, for example, if a regular expression implementation makes use of a property value alias like \p{block = Greek} such an expression would continue to be valid according to the Unicode Standard, even if the preferred name changes (as it did) to \p{block = Greek_And_Coptic}. Even if there are no longer characters with those property values, such as \p{script = Katakana_Or_Hiragana}, the expression would remain valid: it would just match no characters.
There are, however, no such guarantees for contributory properties or provisional properties.
For backwards compatibility, implementations should always support all of the aliases in PropertyAliases and PropertyValueAliases for any properties that they support, and should always follow the property matching rules specified in the UCD.
The operational effect of this policy for the UTC would be:
-
We never remove a property or property value alias from PropertyAliases and PropertyValueAliases
-
Even if a property is stabilized or deprecated, we never remove the aliases
-
If we introduce a new preferred alias, we always keep the old ones around in PropertyAliases and PropertyValueAliases
-
For example, when we introduce a new property value alias "Inseparable" for what was "Inseperable", we maintain both names in PropertyValueAliases:
-
lb ; IN ; Inseparable ; Inseperable
-
We never allow collisions between aliases in the same namespace.
- We think carefully before we move a property from provisional to normative or informative, since it is then subject to these guarantees.
We have always followed these policies in practice, with extremely rare exceptions like Special_Casing_Condition. (But had we had this policy in place at the time we decided to remove it, we could always have maintained but deprecated it.)
This proposal only applies to those property values listed in PropertyValueAliases; that is to say, Catalog, Enumerated, or Binary properties. What is not part of this proposal is that we stabilize
constraints on non-enumerated values, such as those in
http://unicode.org/Public/UNIDATA/UCD.html#Validating_Property_Values. We may or may not decide to stabilize the possible values for some of those in the future, but that is not part of this proposal.
2. A separate proposal is that we make contributory properties a separate category, neither normative nor informative. These are, in Ken's words ("just derivational hacks aimed at keeping something *else* stable"), and shouldn't be treated as either normative or informative, nor subject to any guarantees for stability. That is conceptually easier for users.
(BTW, Jamo_Short_Name really should have been a contributory property...)
3. A separate proposal is that we add the non-provisional properties from Unihan (http://www.unicode.org/reports/tr38/) to PropertyAliases and PropertyValueAliases so that the above applies to all non-contributory normative and informative Unicode properties (but not to provisional or contributory properties). This are given in a list below.
Note that the following Unihan property is already there:
URS ; Unicode_Radical_Stroke
In so doing, we would assign each of these to one of the categories that we use for organizing properties.
- Numeric
- String
- Misc
- Catalog
- Enumerated
- Binary
Only those properties that were Catalog, Enumerated, or Binary would have property values listed in PropertyValueAliases, and be subject to the above stability constraints for property values and property value aliases.
Tag: |
kAccountingNumeric |
Status: |
Informative |
Tag: |
kOtherNumeric |
Status: |
Informative |
Tag: |
kPrimaryNumeric |
Status: |
Informative |
Tag: |
kRSUnicode |
Status: |
Informative |
Tag: |
kCompatibilityVariant |
Status: |
Normative |
Tag: |
kIICore |
Status: |
Normative |
Tag: |
kIRG_GSource |
Status: |
Normative |
Tag: |
kIRG_HSource |
Status: |
Normative |
Tag: |
kIRG_JSource |
Status: |
Normative |
Tag: |
kIRG_KPSource |
Status: |
Normative |
Tag: |
kIRG_KSource |
Status: |
Normative |
Tag: |
kIRG_TSource |
Status: |
Normative |
Tag: |
kIRG_USource |
Status: |
Normative |
Tag: |
kIRG_VSource |
Status: |
Normative |