L2/12-280
Comments on L2/12-268
Kent Karlsson
2012-07-26
In L2/12-268 Richard Wordingham writes (parts in quotes, my comments outside of
quotes):
"... When the numeric values were [recently, not in the
UCD yet] corrected for
U+1240F CUNEIFORM NUMERIC SIGN FOUR U to U+12414 CUNEIFORM NUMERIC SIGN NINE U
(from
4 to 9 to 40 to 90), this stopped them being collated as secondary variants of
positional
decimal digits."
This is all well and good, as part of the correction.
"Only U+1240F and U+12410 can be considered sexagesimal
digits. This unplanned change to
collation was reversed by modifying the sifter program itself.
Ken Whistler has formally proposed that this modification to sifter be removed
by reversing
the correction to UnicodeData.txt. Would not a better approach be for
UnicodeData.txt to be
correct and keep the incorrect values, possibly with a tag, in the file used by
sifter?"
Or even better, let the correction have the effect it should on the DUCET. There
are more of
these corrections to come, as Richard W. points out, and no other non-"0,...,9"
digits get
collated as if they were. Indeed, I think the DUCET itself should be restricted
to do this
special digit handling only for Nd digits (Unicode "digit"s), not digits more
generally.
And indeed, much bigger changes to the DUCET are planned...
(Ideally, all non-spelled-out non-alphabetic numerals should be sorted in
numerical order,
and indeed ICU has an option to partially do so. I don't see it covering, in the
short run,
numerals other than decimal ones formed by Nd digits, though, and with the
decimal
and group separators of the current locale(?). And with things like section
"14.4" not being
a single number stirring things up, for many locales...)
"1) Of the cuneiform numbers and punctuation, I can
confirm that only the members of the
DISH, ASH and ASH TENU series truly have the values in the range 1 to 9. The
DISH series are
sexagesimal digits, not mere numbers, so I believe they should have numeric type
"digit"."
They cannot be Nd because they do not form 0-9 coded sequentially (which is a
requirement
for Nd-ness). They can, and should, be No, though. But that goes for all of the
Nl Cuneiform
number characters.
"The other series have other values, being multiples of
10, 60, 600, 3,600, 36,000 or 216,000.
This remark also applies to the following six numbers proposed in L2/12-207
(a.k.a. ISO/IEC
JTC1/SC2/WG2/N4277)."
I agree, on both counts. And I am working on a proposal to fix those too. In the
meantime,
here is my draft list (except for those in N4277):
𒐕 12415;CUNEIFORM NUMERIC SIGN ONE GESH2;No;0;L;;;;60;N;;;;; // instead of <1,
0> in base 60
𒐖 12416;CUNEIFORM NUMERIC SIGN TWO GESH2;No;0;L;;;;120;N;;;;; // instead of <2,
0> in base 60
𒐗 12417;CUNEIFORM NUMERIC SIGN THREE GESH2;No;0;L;;;;180;N;;;;;
𒐘 12418;CUNEIFORM NUMERIC SIGN FOUR GESH2;No;0;L;;;;240;N;;;;;
𒐙 12419;CUNEIFORM NUMERIC SIGN FIVE GESH2;No;0;L;;;;300;N;;;;;
𒐚 1241A;CUNEIFORM NUMERIC SIGN SIX GESH2;No;0;L;;;;360;N;;;;;
𒐛 1241B;CUNEIFORM NUMERIC SIGN SEVEN GESH2;No;0;L;;;;420;N;;;;;
𒐜 1241C;CUNEIFORM NUMERIC SIGN EIGHT GESH2;No;0;L;;;;480;N;;;;;
𒐝 1241D;CUNEIFORM NUMERIC SIGN NINE GESH2;No;0;L;;;;540;N;;;;;
𒐞 1241E;CUNEIFORM NUMERIC SIGN ONE GESHU;No;0;L;;;;600;N;;;;; // instead of
<10, 0> in base 60
𒐟 1241F;CUNEIFORM NUMERIC SIGN TWO GESHU;No;0;L;;;;1200;N;;;;; // instead of
<20, 0> in base 60
𒐠 12420;CUNEIFORM NUMERIC SIGN THREE GESHU;No;0;L;;;;1800;N;;;;;
𒐡 12421;CUNEIFORM NUMERIC SIGN FOUR GESHU;No;0;L;;;;2400;N;;;;;
𒐢 12422;CUNEIFORM NUMERIC SIGN FIVE GESHU;No;0;L;;;;3000;N;;;;;
xxxxx;CUNEIFORM NUMERIC SIGN ONE SHAR2;No;0;L;;;;3600;N;;;;; // 1*60*60, i.e.
<1, 0, 0> in base 60, NEW, TO BE PROPOSED, disunify from 1212D;CUNEIFORM SIGN HI
𒐣 12423;CUNEIFORM NUMERIC SIGN TWO SHAR2;No;0;L;;;;7200;N;;;;; // 2*60*60, i.e.
<2, 0, 0> in base 60
𒐤 12424;CUNEIFORM NUMERIC SIGN THREE SHAR2;No;0;L;;;;10800;N;;;;;
𒐥 12425;CUNEIFORM NUMERIC SIGN THREE SHAR2 VARIANT FORM;No;0;L;;;;10800;N;;;;;
𒐦 12426;CUNEIFORM NUMERIC SIGN FOUR SHAR2;No;0;L;;;;14400;N;;;;;
𒐧 12427;CUNEIFORM NUMERIC SIGN FIVE SHAR2;No;0;L;;;;18000;N;;;;;
𒐨 12428;CUNEIFORM NUMERIC SIGN SIX SHAR2;No;0;L;;;;21600;N;;;;;
𒐩 12429;CUNEIFORM NUMERIC SIGN SEVEN SHAR2;No;0;L;;;;25200;N;;;;;
𒐪 1242A;CUNEIFORM NUMERIC SIGN EIGHT SHAR2;No;0;L;;;;28800;N;;;;;
𒐫 1242B;CUNEIFORM NUMERIC SIGN NINE SHAR2;No;0;L;;;;32400;N;;;;;
𒐬 1242C;CUNEIFORM NUMERIC SIGN ONE SHARU;No;0;L;;;;36000;N;;;;; // i.e. <10, 0,
0> in base 60
𒐭 1242D;CUNEIFORM NUMERIC SIGN TWO SHARU;No;0;L;;;;72000;N;;;;;
𒐮 1242E;CUNEIFORM NUMERIC SIGN THREE SHARU;No;0;L;;;;108000;N;;;;;
𒐯 1242F;CUNEIFORM NUMERIC SIGN THREE SHARU VARIANT FORM;No;0;L;;;;108000;N;;;;;
𒐰 12430;CUNEIFORM NUMERIC SIGN FOUR SHARU;No;0;L;;;;144000;N;;;;;
𒐱 12431;CUNEIFORM NUMERIC SIGN FIVE SHARU;No;0;L;;;;180000;N;;;;;
--
𒐲 12432;CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS
DISH;No;0;L;;;;216000;N;;;;; // <1, 0, 0, 0> in base 60
𒐳 12433;CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;No;0;L;;;;432000;N;;;;;
// <2, 0, 0, 0> in base 60
That leaves a few Sumero-Akkadian Cuneiform number characters that I haven't
figured out
the values for.
Another problem is that several Sumero-Akkadian Cuneiform digits are unified
with Sumero-Akkadian
Cuneiform letters. And the letters have other properties than the digits should
have. I'm working on a
proposal to disunify the Cuneiform digits from Cuneiform letters. My current
draft list is as follows:
𒀹 xxxxx;CUNEIFORM NUMERIC SIGN ONE ASH TENU;No;0;L;;;;1;N;;;;; // disunify from
12039;CUNEIFORM SIGN ASH ZIDA TENU
𒀸 xxxxx;CUNEIFORM NUMERIC SIGN ONE ASH;No;0;L;;;;1;N;;;;; // disunify from
2038;CUNEIFORM SIGN ASH
𒋰 xxxxx;CUNEIFORM NUMERIC SIGN TWO ASH VARIANT FORM;No;0;L;;;;2;N;;;;; //
disunify from 122F0;CUNEIFORM SIGN TAB
yyyyy;CUNEIFORM NUMERIC SIGN FOUR ASH VARIANT FORM A;No;0;L;;;;4;N;;;;; //
like 1243C but horizontal
yyyyy;CUNEIFORM NUMERIC SIGN FOUR ASH VARIANT FORM B;No;0;L;;;;4;N;;;;; //
like 144BE but horizontal
𒄿 xxxxx;CUNEIFORM NUMERIC SIGN FIVE ASH VARIANT FORM;No;0;L;;;;5;N;;;;; //
disunify from 1213F;CUNEIFORM SIGN I; 3,2
yyyyy;CUNEIFORM NUMERIC SIGN FIVE ASH VARIANT FORM A;No;0;L;;;;5;N;;;;; //
2,2,1
yyyyy;CUNEIFORM NUMERIC SIGN SEVEN VARIANT FORM;No;0;L;;;;7;N;;;;; // like
12442 but horizontal
𒁹 xxxxx;CUNEIFORM NUMERIC SIGN ONE DISH;No;0;L;;;;1;N;;;;; // disunify from
12079;CUNEIFORM SIGN DISH
𒈫 xxxxx;CUNEIFORM NUMERIC SIGN TWO DISH;No;0;L;;;;2;N;;;;; // disunify from
1222B;CUNEIFORM SIGN MIN
yyyyy;CUNEIFORM NUMERIC SIGN FIVE DISH VARIANT FORM A;;No;0;L;;;;5;N;;;;; //
2s+3s
yyyyy;CUNEIFORM NUMERIC SIGN FIVE DISH VARIANT FORM B;;No;0;L;;;;5;N;;;;; //
like 12403 but vertical (2s+2s+1)
yyyyy;CUNEIFORM NUMERIC SIGN SIX DISH VARIANT FORM A;;No;0;L;;;;6;N;;;;; //
like 12404 but vertical
𒌋 xxxxx;CUNEIFORM NUMERIC SIGN ONE U;No;0;L;;;;10;N;;;;; // disunify from
1230B;CUNEIFORM SIGN U
xxxxx;CUNEIFORM NUMERIC SIGN TWO U;No;0;L;;;;20;N;;;;; // do not unify with U
U (proposed in L2/12-207: 12399;CUNEIFORM SIGN U U;Lo;0;L;;;;;N;;;;;)
𒑱 yyyyy;CUNEIFORM NUMERIC SIGN TWO U VARIANT FORM;No;0;L;;;;20;N;;;;; //
disunify from 12471, CUNEIFORM PUNCTUATION SIGN VERTICAL COLON
𒌍 xxxxx;CUNEIFORM NUMERIC SIGN THREE U;No;0;L;;;;30;N;;;;; // disunify from
1230D;CUNEIFORM SIGN U U U
yyyyy;CUNEIFORM NUMERIC SIGN THREE U VARIANT FORM;No;0;L;;;;30;N;;;;; //
xxxxx;CUNEIFORM NUMERIC SIGN ONE SHAR2;No;0;L;;;;3600;N;;;;; // 1*60*60, i.e.
<1, 0, 0> in base 60, disunify from 1212D;CUNEIFORM SIGN HI
10 disunifications, and 9 for completing digit design styles (which are already
disunified
within the set of Cuneiform digits, but incomplete).
I think
𒑉 12449;CUNEIFORM NUMERIC SIGN NINE VARIANT FORM ILIMMU A;No;0;L;;;;9;N;;;;;
should be
𒑉 12449;CUNEIFORM NUMERIC SIGN NINE VARIANT FORM ILIMMU A;No;0;L;;;;3;N;;;;;
i.e. 3 rather than 9 despite the name.
I also think there should be a zero digit (used as a filler digit only, the
concept of zero was
not invented at the time), as there was a zero digit used instead of just space
(or even ambiguity).
"2) U+1D369 COUNTING ROD TENS DIGIT ONE to U+1D371
COUNTING ROD TENS DIGIT NINE are digits
in a decimal place value system, so they should have numeric type "digit" and
values 1 to 9."
They cannot be Nd, since the zero digit is coded elsewhere, as well as other
complications.
B.t.w. see also CLDR ticket 4473,
http://unicode.org/cldr/trac/ticket/4473,
which gives RBNF
rule sets for handling counting rod numerals, as well as some other Chinese
numeral systems
not already covered by CLDR RBNF rule sets.
"3) The alternating between digit sets is also seen in the
Telugu fraction digits, U+0C78 to U+0C7E.
It is not clear to me why these have numeric type "numeric" rather than
"digit"."
They are not 0-9 encoded sequentially, and therefore cannot be Nd (a.k.a.
"digit" which is a
confusing alias).
"4) U+3021 HANGZHOU NUMERAL ONE to U+3029 HANGZHOU NUMERAL
NINE are digits in a decimal
place value system, so they should have numeric type "digit"."
Same as for Counting rods, these cannot be Nd (digit), for the same reasons.
Again see CLDR ticket 4473.
-------------------