L2/12-074R1
Subject: Property Metadata: Status
To: UTC
From: Mark Davis
Date: 2012-02-05 (revised 2012-11-06)
Live doc: http://goo.gl/wMEbd
In http://www.unicode.org/L2/L2010/10052-metaprop.txt, Ken presented a proposal for property metadata.
While there was a lot of value to that proposal, overall it is rather complicated. I think we can make
incremental progress by:
Right now, we have quite a number of different metaproperty definitions that are a “kind of status”, including what we call status in UAX #44:
Third Column. This column indicates the status of the property: Normative or Informative or Contributory or Provisional.
We also have the related features Immutable, Deprecated, Contributory, Stabilized, Overridable, and Obsolete. However, when you look at the use in practice, some features and combinations of features are simply not necessary. I think with a few small changes we can have a much simpler, more understandable, and more useful overall picture to present to developers. Some of these recommendations could be done in v6.3, while others might wait for v7.0.
The data for this is v6.1 data scraped from UAX #44 (scraped, because we don’t have machine-readable files).
In detail:
1. Overridable
There is only one property indicated in the standard to be Overridable: Canonical_Decomposition. Moreover, (a) we don’t give any real indication of the actual implementation considerations (Cf 12-075), and (b) the real purpose that was to be achieved by “overridable” was really to make an algorithm stable; and we have achieved that with the stability policies for normalization. So this is really unnecessary.
Recommendation: remove Overridable
2. Obsolete & Stabilized
There are only two properties that qualify as Stabilized, and only one as Obsolete:
ISO_Comment; Informative; Deprecated; Stabilized; Obsolete
Hyphen; Informative; Deprecated; Stabilized
These two characteristics don’t really add any useful information for implementers (above and beyond Deprecated) and those properties are already Deprecated. So they are unnecessary; we can just use Deprecated.
Recommendation: remove Obsolete & Stabilized
3. Deprecated & Informative
There are then a small number of Deprecated properties, and they are all Informative: FC_NFKC_Closure, Expands_On_NFC, Expands_On_NFD, Expands_On_NFKC, Expands_On_NFKD, Grapheme_Link, Hyphen, ISO_Comment. Yet the Informative status doesn’t add anything once something is Deprecated: the Deprecated feature trumps Informative. And this is then simpler if it is combined into the single Status.
Recommendation: make Deprecated into a Status value.
4. Contributory & Immutable
There is only one such property:
Jamo_Short_Name; Contributory; Immutable
For Contributory properties, the feature “Immutable” is not important. The property that they contribute to is the one where the status matters. The Name property is the one that is important to be Immutable, not anything that contributes to it. If that is removed from Jamo_Short_Name, then the way is paved for making Immutable just another status value, a type of Normative. And it doesn’t seem important to have the ability to have a Informative property be Immutable; if it is that important that it be Immutable, it should be Normative.
Recommendation: make Immutable into a Status value.
Note: I think this is less important; although it makes sense to me to combine Immutable into a Status value, it wouldn’t be too bad to retain Immutable as a separate metaproperty either.
5. Property_Status
What I propose once we have 1-4 is an enumerated metaproperty called Property_Status, with values {Immutable, Normative, Informative, Provisional, Contributory, Deprecated}, contained in a text file called PropertyStatus.txt. We’d also need to touch up some parts of UAX #44 and Chapter 3 to reflect the above changes.
Here is a proposal for the initial contents of that file. Of course, as we add more properties or change their status, we’d record those changes in the file.
# PropertyName; Status
Decomposition_Mapping; Immutable
Name; Immutable
Canonical_Combining_Class; Immutable
Pattern_Syntax; Immutable
Pattern_White_Space; Immutable
Numeric_Value; Normative
Case_Folding; Normative
Simple_Case_Folding; Normative
Simple_Lowercase_Mapping; Normative
Simple_Titlecase_Mapping; Normative
Simple_Uppercase_Mapping; Normative
kCompatibilityVariant; Normative
Name_Alias; Normative
kIICore; Normative
kIRG_GSource; Normative
kIRG_HSource; Normative
kIRG_JSource; Normative
kIRG_KPSource; Normative
kIRG_KSource; Normative
kIRG_MSource; Normative
kIRG_TSource; Normative
kIRG_USource; Normative
kIRG_VSource; Normative
Age; Normative
Block; Normative
Bidi_Class; Normative
Decomposition_Type; Normative
General_Category; Normative
Hangul_Syllable_Type; Normative
Joining_Group; Normative
Joining_Type; Normative
Line_Break; Normative
NFC_Quick_Check; Normative
NFD_Quick_Check; Normative
NFKC_Quick_Check; Normative
NFKD_Quick_Check; Normative
Numeric_Type; Normative
ASCII_Hex_Digit; Normative
Bidi_Control; Normative
Bidi_Mirrored; Normative
Composition_Exclusion; Normative
Default_Ignorable_Code_Point; Normative
Deprecated; Normative
Full_Composition_Exclusion; Normative
Grapheme_Base; Normative
Grapheme_Extend; Normative
IDS_Binary_Operator; Normative
IDS_Trinary_Operator; Normative
Join_Control; Normative
Logical_Order_Exception; Normative
Noncharacter_Code_Point; Normative
Radical; Normative
Soft_Dotted; Normative
Unified_Ideograph; Normative
Variation_Selector; Normative
White_Space; Normative
kAccountingNumeric; Informative
kOtherNumeric; Informative
kPrimaryNumeric; Informative
Lowercase_Mapping; Informative
NFKC_Casefold; Informative
Titlecase_Mapping; Informative
Uppercase_Mapping; Informative
Bidi_Mirroring_Glyph; Informative
Script_Extensions; Informative
Unicode_1_Name; Informative
kMandarin; Informative
kRSUnicode; Informative
kTotalStrokes; Informative
Script; Informative
East_Asian_Width; Informative
Grapheme_Cluster_Break; Informative
Sentence_Break; Informative
Word_Break; Informative
Alphabetic; Informative
Case_Ignorable; Informative
Cased; Informative
Changes_When_Casefolded; Informative
Changes_When_Casemapped; Informative
Changes_When_Lowercased; Informative
Changes_When_NFKC_Casefolded; Informative
Changes_When_Titlecased; Informative
Changes_When_Uppercased; Informative
Dash; Informative
Diacritic; Informative
Extender; Informative
Hex_Digit; Informative
ID_Continue; Informative
ID_Start; Informative
Ideographic; Informative
Lowercase; Informative
Math; Informative
Quotation_Mark; Informative
STerm; Informative
Terminal_Punctuation; Informative
Uppercase; Informative
XID_Continue; Informative
XID_Start; Informative
CJK_Radical; Provisional
Emoji_DCM; Provisional
Emoji_KDDI; Provisional
Emoji_SB; Provisional
Named_Sequences; Provisional
Named_Sequences_Prov; Provisional
Standardized_Variant; Provisional
kBigFive; Provisional
kCCCII; Provisional
kCNS1986; Provisional
kCNS1992; Provisional
kCangjie; Provisional
kCantonese; Provisional
kCheungBauer; Provisional
kCheungBauerIndex; Provisional
kCihaiT; Provisional
kCowles; Provisional
kDaeJaweon; Provisional
kDefinition; Provisional
kEACC; Provisional
kFenn; Provisional
kFennIndex; Provisional
kFourCornerCode; Provisional
kFrequency; Provisional
kGB0; Provisional
kGB1; Provisional
kGB3; Provisional
kGB5; Provisional
kGB7; Provisional
kGB8; Provisional
kGSR; Provisional
kGradeLevel; Provisional
kHDZRadBreak; Provisional
kHKGlyph; Provisional
kHKSCS; Provisional
kHanYu; Provisional
kHangul; Provisional
kHanyuPinlu; Provisional
kHanyuPinyin; Provisional
kIBMJapan; Provisional
kIRGDaeJaweon; Provisional
kIRGDaiKanwaZiten; Provisional
kIRGHanyuDaZidian; Provisional
kIRGKangXi; Provisional
kJIS0213; Provisional
kJapaneseKun; Provisional
kJapaneseOn; Provisional
kJis0; Provisional
kJis1; Provisional
kKPS0; Provisional
kKPS1; Provisional
kKSC0; Provisional
kKSC1; Provisional
kKangXi; Provisional
kKarlgren; Provisional
kKorean; Provisional
kLau; Provisional
kMainlandTelegraph; Provisional
kMatthews; Provisional
kMeyerWempe; Provisional
kMorohashi; Provisional
kNelson; Provisional
kPhonetic; Provisional
kPseudoGB1; Provisional
kRSAdobe_Japan1_6; Provisional
kRSJapanese; Provisional
kRSKanWa; Provisional
kRSKangXi; Provisional
kRSKorean; Provisional
kSBGY; Provisional
kSemanticVariant; Provisional
kSimplifiedVariant; Provisional
kSpecializedSemanticVariant; Provisional
kTaiwanTelegraph; Provisional
kTang; Provisional
kTraditionalVariant; Provisional
kVietnamese; Provisional
kXHC1983; Provisional
kXerox; Provisional
kZVariant; Provisional
Indic_Matra_Category; Provisional
Indic_Syllabic_Category; Provisional
Jamo_Short_Name; Contributory
Other_Alphabetic; Contributory
Other_Default_Ignorable_Code_Point; Contributory
Other_Grapheme_Extend; Contributory
Other_ID_Continue; Contributory
Other_ID_Start; Contributory
Other_Lowercase; Contributory
Other_Math; Contributory
Other_Uppercase; Contributory
FC_NFKC_Closure; Deprecated
ISO_Comment; Deprecated
Expands_On_NFC; Deprecated
Expands_On_NFD; Deprecated
Expands_On_NFKC; Deprecated
Expands_On_NFKD; Deprecated
Grapheme_Link; Deprecated
Hyphen; Deprecated