[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #8164(accepted data)

Opened 2 years ago

Last modified 2 years ago

RBNF rules for "alphabetic" numbering

Reported by: kent.karlsson14@… Owned by: grhoten
Component: other Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:


I took another look at making RBNF rules for "alphabetic" numbering, most often used for
alphabetically itemized item lists. Like

a) first item
b) second item
aa) one after end of alphabet item
ab) ...

Another use (with A-Z) is to use them for column "numbering" in spreadsheets, Excel
allows columns up to around 16000, needing three letters for their maximal indices.

I may be rare that alphabetically numbered item lists in actual documents are very
long, so it is rare that even two letters are needed, let alone three. But CSS3
allows for (in principle) arbitrarily many letters in the alphabetic "numerals".
However 5 letters should be much more than enough for all but very rare uses
(these "numerals" are not really suitable for expressing numbers anyway, let alone
large ones).

Generating such numerals via RBNF turned out to be perfectly feasible, with
reasonable size for the rule sets. But the rules contain a lot of numbers... To manage
that I wrote a script for generating all those numbers in the right places.

In the attached zip file you will find:

A) A bash script for generating RBNF rule sets (lttr.sh) for an alphabet given in arguments.

B) A bash script (indexs.sh), calling lttr.sh, for generating rule sets for a variety

of alphabets, most of them taken from the "index" exemplars in CLDR (sometimes modified,
usually for round-tripping), some also from http://www.w3.org/TR/predefined-counter-styles/
(with "system: alphabetic"). A number of possibilities have been commented out, for
various reasons. The alphabets also need to be reviewed, if CLDR is to cover these
alphabetic "numbering"s.

C) The result of running the indexs.sh script, one (.txt) file per locale covered for this.

After doing changes to either script, remove the .txt files and rerun indexs.sh,
after moving them to the folder enclosing the arbnf folder (where they are in the
zip file).


arbnf.zip (194.7 KB) - added by kent.karlsson14@… 2 years ago.
Zip file with scripts and data

Change History

Changed 2 years ago by kent.karlsson14@…

Zip file with scripts and data

comment:1 Changed 2 years ago by markus

  • Type set to data

comment:2 Changed 2 years ago by markus

  • Component changed from data-other to other

comment:3 Changed 2 years ago by markus

  • Owner emmons deleted

comment:4 Changed 2 years ago by emmons

  • Owner set to grhoten
  • Status changed from new to accepted

George to evaluate.

comment:5 Changed 2 years ago by grhoten

This is an interesting task, but I think it deserves a brand new numbering type that is separate from the spellout rules, if it were accepted for inclusion into CLDR. It's essentially taking the collation index characters and turning them into an enumeration for a list. At least that is what a spot check showed.

In fact, it may just be better suited as a separate API instead of CLDR because it seems that it is being derived from other CLDR data. The implementation would probably be similar to the scripts that were attached in this ticket.

comment:6 Changed 2 years ago by kent.karlsson14@…

I think it would fit well under RBNF OrdinalRules in the respective locales.

Compare http://dev.w3.org/csswg/css-counter-styles/#alphabetic-system
and http://www.w3.org/TR/predefined-counter-styles/.

The latter gives a number of alphabetic (and "fixed") counter styles. Strangely only one for the Latin script, and far too many for certain other scripts.

CLDR already covers what CSS refers to as "numeric" (not via RBNF) and several of the "additive" systems (via RBNF, root locale).


Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.