Mark Davis, 2009-07-08
I was working on the following action. We were not able to come to consensus as to how to proceed in the ed committee, so I'm bringing this to the UTC.
117 A040 Mark Davis Update
PropertyAliases.txt and PropertyValueAliases.txt with
the Unihan properties. L2/08-352
UCD 2008-11-12 2008-11-12
The document is: http://www.unicode.org/L2/L2008/08352-stability-prop.html. That document didn't specify the property names or aliases, and I also needed default values. I followed the analogy of kRSUnicode, which shows up as: # Unicode_Radical_Stroke (URS) # @missing: 0000..10FFFF; Unicode_Radical_Stroke; <none> The CompatibilityVariant and the numerics have defaults for String and Numeric properties, however. Note that we probably should list CompatibilityVariant as a derived property in #44, since it -- according to #38 -- is derived from the compat decomp in UnicodeData; and is thus (I presume) just filtered to only be CJK_Ideographs. <BTW>
We really ought to have a faq explaining what the heck
the difference is between the properties Ideograph and
Unified Ideograph, since it is pretty impossible to
guess from the names. It appears that Ideograph is
really a derived property and equal to the following. Is
this relationship by intent or accident?
</BTW>
Unified Ideograph + HANGZHOU numerals + compat
ideographs + 3006 + 3007
cf: http://unicode.org/cldr/utility/unicodeset.jsp?a=[:Ideographic:]&b=[:Unified_Ideograph:] http://unicode.org/cldr/utility/unicodeset.jsp?a=[:Ideographic:]&b=[\u3006\u3007[:Unified_Ideograph:][:name=/HANGZHOU|CJK%20COMPATIBILITY%20IDEOGRAPH/:]] On that basis, here was what I came up with. --- Aliases --- # ================================================ # Numeric Properties # ================================================ CJK_AC ; CJK_AccountingNumeric ; kAccountingNumeric CJK_ON ; CJK_OtherNumeric ; kOtherNumeric CJK_PN ; CJK_PrimaryNumeric ; kPrimaryNumeric ... # ================================================ # String Properties # ================================================ ... CJK_CV ; CJK_CompatibilityVariant ; kCompatibilityVariant ... # ================================================ # Miscellaneous Properties # ================================================ IIC ; IICore ; kIICore IRG_G ; IRG_GSource ; kIRG_GSource IRG_H ; IRG_HSource ; kIRG_HSource IRG_J ; IRG_JSource ; kIRG_JSource IRG_K ; IRG_KSource ; kIRG_KSource IRG_KP ; IRG_KPSource ; kIRG_KPSource IRG_T ; IRG_TSource ; kIRG_TSource IRG_U ; IRG_USource ; kIRG_USource IRG_V ; IRG_VSource ; kIRG_VSource ... URS ; Unicode_Radical_Stroke ; kRSUnicode --- ValueAliases --- # CJK_AccountingNumeric (CJK_AC) # @missing: 0000..10FFFF; CJK_AccountingNumeric; NaN # CJK_CompatibilityVariant (CJK_CV) # @missing: 0000..10FFFF; CJK_CompatibilityVariant; <code point> # CJK_OtherNumeric (CJK_ON) # @missing: 0000..10FFFF; CJK_OtherNumeric; NaN # CJK_PrimaryNumeric (CJK_PN) # @missing: 0000..10FFFF; CJK_PrimaryNumeric; NaN # IICore (IIC) # @missing: 0000..10FFFF; IICore; <none> # IRG_GSource (IRG_G) # @missing: 0000..10FFFF; IRG_GSource; <none> # IRG_HSource (IRG_H) # @missing: 0000..10FFFF; IRG_HSource; <none> # IRG_JSource (IRG_J) # @missing: 0000..10FFFF; IRG_JSource; <none> # IRG_KPSource (IRG_KP) # @missing: 0000..10FFFF; IRG_KPSource; <none> # IRG_KSource (IRG_K) # @missing: 0000..10FFFF; IRG_KSource; <none> # IRG_TSource (IRG_T) # @missing: 0000..10FFFF; IRG_TSource; <none> # IRG_USource (IRG_U) # @missing: 0000..10FFFF; IRG_USource; <none> # IRG_VSource (IRG_V) # @missing: 0000..10FFFF; IRG_VSource; <none> Here are some back and forths on this, edited heavily for brevity.
I would really really prefer that we don't invent any
new names, long or short.
For the long names, the kFoo names are perfectly adequate in my opinion, they are what the Unihan users know (at the very least, they have to be the "preferred" long names). For the short names, with 88 properties, they are bound to be impossible to remember (e.g. for me CJK_ON suggests kJapaneseOn rather than kOtherNumeric). ...
I can understand that, but we also need to be consistent
with what we've done already with kRSUnicode, and other
properties. Note that the kxxx names are retained as
aliases, and there'd be no problem with your continuing
to use them in the xml.
Having the k form (which can be very long) be the short alias is simply bizarre. As far as the UCD properties were concerned, the tags in Unihan were just gorp; this is the point at which we are really fully recognizing (some of) them as UCD properties, and we should give them consistent names, as we have *already done* with kRSUnicode. Note that with Unicode_Radical_Stroke, the we didn't even use the k form as an *alias*; the official name was all and only the first two fields below. In the above, I added the 'k' form as an alias.. URS ; Unicode_Radical_Stroke ; kRSUnicode |