L2/12-310
Mark Davis
Live Document: http://goo.gl/KIMTs
We provide a crisp and useful definition of Numeric_Type=Decimal. However we have not provided a crisp and useful definition of the other two types: Numeric_Type=Digit and Numeric_Type=Numeric. We’ve also drifted away from the (few) characterizations that we had in the text. This issue was raised in the UTC last time, so I spent some time looking at the current contents, and what would make a coherent definition.
Proposal
Put the following proposal up for public review, targeted at Unicode 6.2.1.
1. Add the following definitions
(The underlined examples would be changed from current properties.)
Numeric_Type=Decimal
Characters used in a positional decimal systems, which standard base-10 radix systems with contiguous digits 0..9, and are most-significant-digit first (backingstore order). These are coextensive by definition with General_Category=Decimal_Number.
Rationale: This is simply a formulation of conditions that we already have.
Numeric_Type=Digit
Variants of positional decimal characters (Numeric_Type=Decimal) or sequences thereof. These include super/subscripts, enclosed, or decorated by the addition of characters such as parentheses, dots, or commas.
Examples:
U+2080 ( ₀ ) SUBSCRIPT ZERO
U+2460 ( ① ) CIRCLED DIGIT ONE
U+2469 ( ⑩ ) CIRCLED NUMBER TEN
Rationale: This provides a cohesive, useful definition, and does not break series of related numbers like circled Western numbers, or include non-decimal numbers like Ethiopic:
Numeric_Type=Numeric
Characters with numeric value, but that are neither Decimal nor Digit.
Examples:
U+2150 ( ⅐ ) VULGAR FRACTION ONE SEVENTH
U+2160 ( Ⅰ ) ROMAN NUMERAL ONE
U+1369 ( ፩ ) ETHIOPIC DIGIT ONE
U+1372 ( ፲ ) ETHIOPIC NUMBER TEN
U+0D72 ( ൲ ) MALAYALAM NUMBER ONE THOUSAND
U+3021 ( 〡 ) HANGZHOU NUMERAL ONE
2. Change NT properties for certain characters, consistent with the above.
A. from Numeric_Type=Digit to Numeric_Type=Numeric
U+10E60 ( 𐹠 ) RUMI DIGIT ONE...U+10E68 ( 𐹨 ) RUMI DIGIT NINE
U+11052 ( 𑁒 ) BRAHMI NUMBER ONE…U+1105A ( 𑁚 ) BRAHMI NUMBER NINE
U+1369 ( ፩ ) ETHIOPIC DIGIT ONE...U+1371 ( ፱ ) ETHIOPIC DIGIT NINE
U+10A40 ( 𐩀 ) KHAROSHTHI DIGIT ONE...U+10A43 ( 𐩃 ) KHAROSHTHI DIGIT FOUR
U+19DA ( ᧚ ) NEW TAI LUE THAM DIGIT ONE
B. from Numeric_Type=Numeric to Numeric_Type=Digit
U+2469 ( ⑩ ) CIRCLED NUMBER TEN...U+2473 ( ⑳ ) CIRCLED NUMBER TWENTY
U+247D ( ⑽ ) PARENTHESIZED NUMBER TEN...U+2487 ( ⒇ ) PARENTHESIZED NUMBER TWENTY
U+2491 ( ⒑ ) NUMBER TEN FULL STOP...U+249B ( ⒛ ) NUMBER TWENTY FULL STOP
U+277F ( ❿ ) DINGBAT NEGATIVE CIRCLED NUMBER TEN
U+24EB ( ⓫ ) NEGATIVE CIRCLED NUMBER ELEVEN...U+24F4 ( ⓴ ) NEGATIVE CIRCLED NUMBER TWENTY
U+3251 ( ㉑ ) CIRCLED NUMBER TWENTY ONE...U+32BF ( ㊿ ) CIRCLED NUMBER FIFTY
U+3248 ( ㉈ ) CIRCLED NUMBER TEN ON BLACK SQUARE...U+324F ( ㉏ ) CIRCLED NUMBER EIGHTY ON BLACK SQUARE
U+24FE ( ⓾ ) DOUBLE CIRCLED NUMBER TEN
U+2789 ( ➉ ) DINGBAT CIRCLED SANS-SERIF NUMBER TEN
U+2793 ( ➓ ) DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN
Background Information
Current contents, for comparison
Text:
Numeric_Type=Decimal & General_Category=Decimal_Number
General_Category=Letter_Number
General_Category=Other_Number
Numeric_Type=Digit
Numeric_Type=Numeric