[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #5546(accepted data)

Opened 5 years ago

Last modified 3 years ago

follow DUCET with other numbers among symbols

Reported by: markus Owned by: markus
Component: collation Data Locale: root
Phase: Review:
Weeks: 0.1 Data Xpath:
Xref:

Description

I propose that we follow the DUCET in how we order "other numbers". That is, I propose we stop reordering them from the symbol group into the digit group.

Details:

Compared with the DUCET, "CLDR groups the numbers together after currency symbols, instead of splitting them with some before and some after." (see the LDML spec).

There are about 200 "other number" characters that CLDR modifies, for example U+0BF0 ௰ Tamil Number Ten and U+2180 ↀ Roman Numeral 1000 CD. On the DUCET symbol chart they are the characters from 09F4 to 1D371.

CLDR sorts all of these in the "digit" reordering group, just before digit 0. They do not sort in the order of numeric values, they are not digits, and they do not decompose to digits.

With numeric sorting on, and with computed primary weights for numeric sorting at the beginning of the digit group like we defined in LDML 22, the "other number" characters sort between the digits-as-numbers and the compatibility digits.

The current reordering puts all of the characters together that have General_Category=Number, but I do not see that this order is better, in any practical sense, than their DUCET order.

I think it is desirable to reduce the difference between the DUCET and the CLDR root, to reduce surprises for users and to reduce our tooling and documentation burden.

Attachments

sort-with-digit-1.txt (13.7 KB) - added by markus 5 years ago.
characters in UCA 6.3 generated allkeys_DUCET.txt that have the same primary weight as ASCII digit 1

Change History

comment:1 Changed 5 years ago by mark

I'm sympathetic, but have some concerns. Currently the UCA has the following order:

general symbols
some strange non-decimal numbers
currency signs
0
digits
variants of digits
1
digits
variants
𒐴
other strange non-decimal numbers
½ fractions
① ②
sequences
12 ...
...
2 ...
A
letters
L Nl values interleaved with digits.

I think the least surprising order for numeric sorting would be to have all of the items that can be interpreted as decimal numbers, sorted as decimal numbers, all in one group, and all the other numbers (other than Nl), sorted in another group.

general symbols
currency signs

following in numeric order
0
digits
½ fractions
variants of digits
1 digits
variants
2 ...
① ②
sequences
12 ...
...

some strange non-decimal numbers
𒐴
other strange non-decimal numbers

A letters
L
Nl values interleaved with digits.

The UCA interleaves some questionable items in with digits, like ½ between 1 and 2. For example, the characters that have or contain the same primary weight as "1" in http://www.unicode.org/Public/UCA/6.3.0/allkeys-6.3.0d1.txt include the following:

[⑴ ⑽-⒆ 11𝟏𝟙𝟣𝟭𝟷①⓵❶➀➊¹ ₁١۱𐹠߁፩𐒡१১੧૧୧௧౧౹౼೧൧꯱꣑᥇ ᧑᧚᪁᪑๑໑༡༪᱁꤁၁႑𑄷១៱꩑᭑꧑᮱᠑᱑ ꘡𑃱𐄇𐅂𐅘-𐅚𐌠𐏑𒐕𒐞𒐬𒐴𒑏𒑘𐩽𐤖𐡘𐭘𐭸𑇑 𑛁𑁧𑁒𐩀𝍠 🄂 ⒈ ⅟ ⅒ ½ ⅓ ¼ ⅕ ⅙ ⅐ ⅛ ⅑ ⑩⓾❿➉➓㉈ ⒑ ㏩ ㋉ ㍢ ⑪⓫ ⒒ ㏪ ㋊ ㍣ ⑫⓬ ⒓ ㏫ ㋋ ㍤ ⑬⓭ ⒔ ㏬ ㍥ ⑭⓮ ⒕ ㏭ ㍦ ⑮ ⓯ ⒖ ㏮ ㍧ ⑯⓰ ⒗ ㏯ ㍨ ⑰⓱ ⒘ ㏰ ㍩ ⑱⓲ ⒙ ㏱ ㍪ ⑲⓳ ⒚ ㏲ ㍫ ㏠ ㋀ ㍙ ㉑ ㏴ ㍭ ㉛ ㏾ ㊶ 〡]

But maybe we just don't care much about the outlying items, like:
some strange non-decimal numbers
𒐴
other strange non-decimal numbers

Changed 5 years ago by markus

characters in UCA 6.3 generated allkeys_DUCET.txt that have the same primary weight as ASCII digit 1

comment:2 Changed 5 years ago by markus

I agree there's cruft in the DUCET, I am just not sure it's worth reordering it, or so much of it.

We could collect numeric cruft at the end of the digit group, or we could move numeric cruft from the digit group next to the numeric cruft that the DUCET has in the symbol group; or leave the numeric cruft where it is.

I attached a file with all of the characters that sort with "1" in the UCA 6.3 DUCET. (There are none with a primary weight between "1" and "2".)

comment:3 Changed 5 years ago by markus

  • Owner changed from anybody to markus
  • Status changed from new to assigned

Need to review together with other diffs between DUCET & CLDR root.

comment:4 Changed 3 years ago by markus

  • Data Locale set to root
  • Type changed from enhancement to data
  • Component changed from uca to collation

comment:5 Changed 3 years ago by srl

  • Status changed from assigned to accepted
View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.