[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #6800(closed enhancement: fixed)

Opened 4 years ago

Last modified 3 years ago

Hebrew item and page numbers (RBNF) - initial tests, mostly rolled back out

Reported by: kent.karlsson14@… Owned by: grhoten
Component: main Data Locale: root
Phase: rc Review: pedberg
Weeks: Data Xpath:
Xref:

ticket:7888

Description

Hebrew uses a slightly different number system for item and page numbers. According to discussion regarding numbering styles in CSS, going up to (just above) 2000 seems enough. This system should not be used for year numbers, instead use the Hebrew system already defined in CLDR's RBNF data.

This system does not use any punctuation and does not use a word for thousand, instead it uses repeated tav letters.

%hebrew-item:
-x: −>>;
x.x: =#,##0.00=;
0: אפס;
1: א;
2: ב;
3: ג;
4: ד;
5: ה;
6: ו;
7: ז;
8: ח;
9: ט;
10: י[>>];
15: טו;
16: טז;
17: י>>;
20: כ[>>];
30: ל[>>];
40: מ[>>];
50: נ[>>];
60: ס[>>];
70: ע[>>];
80: פ[>>];
90: צ[>>];
100: ק[>>];
200: ר[>>];
300: ש[>>];
400: ת[>>];
500: תק[>>];
600: תר[>>];
700: תש[>>];
800: תת[>>];
900: תתק[>>];
1000: תתר[>>];
1100: תתש[>>];
1200: תתת[>>];
1300: תתתק[>>];
1400: תתתר[>>];
1500: תתתש[>>];
1600: תתתת[>>];
1700: תתתתק[>>];
1800: תתתתר[>>];
1900: תתתתש[>>];
2000: תתתתת[>>];
2100: =#,##0=;

Attachments

Change History

comment:1 Changed 4 years ago by emmons

  • Owner changed from anybody to grhoten
  • Priority changed from assess to medium
  • Type changed from unknown to enhancement
  • Status changed from new to assigned
  • Milestone changed from UNSCH to 25rc

comment:2 Changed 4 years ago by kent.karlsson14@…

As Matitiahu Allouche pointed out to me, I made a mistake for numbers between 1100 and 2099. The rules should read (note the "/100" that have been added):

%hebrew-item:
-x: −>>;
x.x: =#,##0.00=;
0: אפס;
1: א;
2: ב;
3: ג;
4: ד;
5: ה;
6: ו;
7: ז;
8: ח;
9: ט;
10: י[>>];
15: טו;
16: טז;
17: י>>;
20: כ[>>];
30: ל[>>];
40: מ[>>];
50: נ[>>];
60: ס[>>];
70: ע[>>];
80: פ[>>];
90: צ[>>];

100: ק[>>];
200: ר[>>];
300: ש[>>];
400: ת[>>];

500: תק[>>];
600: תר[>>];
700: תש[>>];
800: תת[>>];

900: תתק[>>];
1000/100: תתר[>>];
1100/100: תתש[>>];
1200/100: תתת[>>];

1300/100: תתתק[>>];
1400/100: תתתר[>>];
1500/100: תתתש[>>];
1600/100: תתתת[>>];

1700/100: תתתתק[>>];
1800/100: תתתתר[>>];
1900/100: תתתתש[>>];
2000/100: תתתתת[>>];

2100: =#,##0=;

comment:3 Changed 4 years ago by kent.karlsson14@…

The use of words for zero and for thousand have been criticised, so those words are
not used in the rules below. In addition, those rules cover numbers up to the "normal"
limit for RBNF (which the current (CLDR 24) rules for Hebrew numerals do not), except
for the %hebrew-item ruleset (which stops at 2099).

For zero, gershayim is used in these rules, which is consistent with it's use in other
Hebrew numerals (adding 0 to the result). That use may be debatable, but it does avoid
using a word for zero, and allows for 1 000, 2 000, ..., 1 000 000, etc. to be expressed
unambiguously, and using only Hebrew characters, but not words. Formally, the Hebrew
numeral system does not have a notation for zero, but here we cannot leave it as the
empty string, as that would be unreadable. Whether there should be a space or not
between groups is an open issue. But most examples I see (see references to web pages
below) do use a space.

There are also more words that should to be avoided (even though it is not considered
an error not to), affecting (at least) 298, 304, 344, 698 and 744 (thanks Mati).

The use of punctuation marks for numbers between 0 and 999 seems to be for when the
numbers are used in running text, not in general if the context (like page numbers
at the bottom of a page, or item numbers introducing a text item) requires a number.
Hence, "normal" (unmarked) and "intext" variant are given below.
One might consider adding a %hebrew-item-intext, like the %hebrew-item ruleset, but
inserting Hebrew punctuation. (The %hebrew-item ruleset is for compatibility with CSS
Hebrew numbering, with the addition of laying down the order of the letters, which CSS
does not specify for an additive system.)

Further, there exists an alternative notation for 500, 600, 700, 800 and 900, using
"final" letters (see http://www.agapebiblestudy.com/charts/letter_number_equivalent.htm,
http://hebrew4christians.com/Grammar/Unit_One/Numeric_Values/numeric_values.html,
http://en.wikipedia.org/wiki/Hebrew_alphabet#Numeric_values_of_letters,
http://www.i18nguy.com/unicode/hebrew-numbers.html), though it seems to be rarely used.
Generation for this variant is given in the %hebrew-alternate and %hebrew-alternate-intext
rulesets given at the end of the code block below.

%hebrew-item:
-x: −>>;
x.x: =#,##0.00=;
0: ״;
1: א;
2: ב;
3: ג;
4: ד;
5: ה;
6: ו;
7: ז;
8: ח;
9: ט;
10: י[>>];
15: טו;
16: טז;
17: י>>;
20: כ[>>];
30: ל[>>];
40: מ[>>];
50: נ[>>];
60: ס[>>];
70: ע[>>];
80: פ[>>];
90: צ[>>];
100: ק[>>];
200: ר[>>];
298: רחצ;
299: ר>>;
300: ש[>>];
304: דש;
305: ש>>;
344: שדמ;
345: ש>>;
400: ת[>>];
500: תק[>>];
600: תר[>>];
698: תרחצ;
699: תר>>;
700: תש[>>];
744: תשדמ;
745: תש>>;
800: תת[>>];
900: תתק[>>];
1000/100: תתר[>>];
1100/100: תתש[>>];
1200/100: תתת[>>];
1300/100: תתתק[>>];
1400/100: תתתר[>>];
1500/100: תתתש[>>];
1600/100: תתתת[>>];
1700/100: תתתתק[>>];
1800/100: תתתתר[>>];
1900/100: תתתתש[>>];
2000/100: תתתתת[>>];
2100: =#,##0=;


%hebrew:
-x: −>>;
x.x: =#,##0.00=;
0: =%hebrew-item=;
1000: <%hebrew<׳ >>;
1000000/1000: <%hebrew<׳ >>>;
1000000000/1000: <%hebrew<׳ >>>;
1000000000000/1000: <%hebrew<׳ >>>;
1000000000000000/1000: <%hebrew<׳ >>>;
1000000000000000000: =#,##0=;


%%hebrew-0-99-intxt:
0: ׳;
1: ״=%hebrew-item=;
11: י״>%hebrew-item>;
15: ט״ו;
16: ט״ז;
17: י״>%hebrew-item>;
20: ״כ;
21: כ״>%hebrew-item>;
30: ״ל;
31: ל״>%hebrew-item>;
40: ״מ;
41: מ״>%hebrew-item>;
50: ״נ;
51: נ״>%hebrew-item>;
60: ״ס;
61: ס״>%hebrew-item>;
70: ״ע;
71: ע״>%hebrew-item>;
80: ״פ;
81: פ״>%hebrew-item>;
90: ״צ;
91: צ״>%hebrew-item>;


%hebrew-intext:
-x: −>>;
x.x: =#,##0.00=;
0: =%hebrew-item=׳;
11: י״>%hebrew-item>;
15: ט״ו;
16: ט״ז;
17: י״>%hebrew-item>;
20: כ׳;
21: כ״>%hebrew-item>;
30: ל׳;
31: ל״>%hebrew-item>;
40: מ׳;
41: מ״>%hebrew-item>;
50: נ׳;
51: נ״>%hebrew-item>;
60: ס׳;
61: ס״>%hebrew-item>;
70: ע׳;
71: ע״>%hebrew-item>;
80: פ׳;
81: פ״>%hebrew-item>;
90: צ׳;
91: צ״>%hebrew-item>;
100: ק>%%hebrew-0-99-intxt>;
200: ר>%%hebrew-0-99-intxt>;
298: רח״צ;
299: ר>%%hebrew-0-99-intxt>;
300: ש>%%hebrew-0-99-intxt>;
304: ד״ש;
305: ש>%%hebrew-0-99-intxt>;
344: שד״מ;
345: ש>%%hebrew-0-99-intxt>;
400: ת>%%hebrew-0-99-intxt>;
500: ת״ק;
501: תק>%%hebrew-0-99-intxt>;
600: ת״ר;
601: תר>%%hebrew-0-99-intxt>;
698: תרח״צ;
699: תר>%%hebrew-0-99-intxt>;
700: ת״ש;
701: תש>%%hebrew-0-99-intxt>;
744: תשד״מ;
745: תש>%%hebrew-0-99-intxt>;
800: ת״ת;
801: תת>%%hebrew-0-99-intxt>;
900: תת״ק;
901: תתק>%%hebrew-0-99-intxt>;
1000: <%hebrew<׳ >>;
1000000/1000: <%hebrew<׳ >>>;
1000000000/1000: <%hebrew<׳ >>>;
1000000000000/1000: <%hebrew<׳ >>>;
1000000000000000/1000: <%hebrew<׳ >>>;
1000000000000000000: =#,##0=;



%hebrew-alternate:
-x: −>>;
x.x: =#,##0.00=;
0: =%hebrew-item=;
500: ך[>>];
600: ם[>>];
700: ן[>>];
800: ף[>>];
900: ץ[>>];
1000: <<׳ >>;
1000000/1000: <%hebrew-alternate<׳ >>>;
1000000000/1000: <%hebrew-alternate<׳ >>>;
1000000000000/1000: <%hebrew-alternate<׳ >>>;
1000000000000000/1000: <%hebrew-alternate<׳ >>>;
1000000000000000000: =#,##0=;

%hebrew-alternate-intext:
-x: −>>;
x.x: =#,##0.00=;
0: =%hebrew-intext=;
500: ך>%%hebrew-0-99-intxt>;
600: ם>%%hebrew-0-99-intxt>;
700: ן>%%hebrew-0-99-intxt>;
800: ף>%%hebrew-0-99-intxt>;
900: ץ>%%hebrew-0-99-intxt>;
1000: <%hebrew-alternate<׳ >>;
1000000/1000: <%hebrew-alternate<׳ >>>;
1000000000/1000: <%hebrew-alternate<׳ >>>;
1000000000000/1000: <%hebrew-alternate<׳ >>>;
1000000000000000/1000: <%hebrew-alternate<׳ >>>;
1000000000000000000: =#,##0=;

comment:4 Changed 4 years ago by grhoten

Kent, can you clarify the context of these numbering types?

The default for the other numbering systems is numbering, which I've used in the context of "item number two" or in isolation instead of "the second item" or "two items". Cardinals, ordinals and years are variants. Your proposal seems to break that consistency.

Also the "alternate" type does not convey enough information on what the context would typically be. When would it be used? The web sites do not make the context that clear.

The intext variant implies cardinality. Are you sure that it's not cardinal?

comment:5 Changed 4 years ago by kent.karlsson14@…

These are not "spellout-", they are "Numbering systems", and (for some reason) all go into root. There is no default for numbering systems.

comment:6 Changed 4 years ago by grhoten

Yes, but I still need context for the rules and their names.

comment:7 Changed 4 years ago by emmons

  • Milestone changed from 25rc to 26rc

comment:8 Changed 4 years ago by grhoten

%hebrew is already used in the calendaring system of CLDR, and I've been told that the current behavior is correct. So I will have to pivot these rules so that the %hebrew results remain unchanged.

comment:9 Changed 4 years ago by grhoten

  • Status changed from assigned to reviewing
  • Review set to emmons

comment:10 Changed 4 years ago by emmons

  • Status changed from reviewing to closed
  • Resolution set to fixed

comment:11 Changed 3 years ago by markus

  • Phase set to rc
  • Milestone changed from 26rc to 26

comment:12 Changed 3 years ago by pedberg

  • Status changed from closed to accepted
  • Resolution fixed deleted

comment:13 Changed 3 years ago by pedberg

  • Cc grhoten, emmons, pedberg added

This is not working correctly, see http://bugs.icu-project.org/trac/ticket/11219, it causes regressions.

Reopened, needs to be fixed for CLDR 26. If we can't fix it quickly for CLDR 26 then we need to back out the changes completely for 26 and move the bug to CLDR 27.

Last edited 3 years ago by pedberg (previous) (diff)

comment:14 Changed 3 years ago by grhoten

  • Status changed from accepted to reviewing
  • Review changed from emmons to pedberg

comment:15 follow-up: ↓ 16 Changed 3 years ago by grhoten

I reverted the thousands part of %hebrew. The rest remains unchanged.

comment:16 in reply to: ↑ 15 Changed 3 years ago by kent.karlsson14@…

Replying to grhoten:

I reverted the thousands part of %hebrew. The rest remains unchanged.

Except that that was the wrong fix. %hebrew as I submitted it (see above) minus the spaces, if indeed the spaces are causing a problem (I would actually think not), would have been the correct fix.

comment:17 follow-up: ↓ 19 Changed 3 years ago by grhoten

If you (Kent) can get תשע״ד to round trip as 5774, then please submit those changes as a separate ticket. I confirmed with a Hebrew speaker, that תשע״ד is correct for the year 5774.

comment:18 Changed 3 years ago by pedberg

hebrew date format -> parse roundtrip is still broken with the current state of this (r10944). I think we need to back out all changes under this ticket, revert to the previous version of this for CLDR 26, and consider these changes for CLDR 27. We are out of time.

Last edited 3 years ago by pedberg (previous) (diff)

comment:19 in reply to: ↑ 17 ; follow-up: ↓ 20 Changed 3 years ago by kent.karlsson14@…

Replying to grhoten:

If you (Kent) can get תשע״ד to round trip as 5774, then please submit those changes as a separate ticket. I confirmed with a Hebrew speaker, that תשע״ד is correct for the year 5774.

The rules in "comment 3" does not seem to generate any rountrip error messages in http://st.unicode.org/cldr-apps/numbers.jsp, not even 5774.

But תשע״ד does not say "5774", it says "774", and roundtrips just fine as that.

Omitting the thousands part in year numbers (like omitting the "century number" in year numbers [yy]) is a completely different ballgame than formatting a "full" given number...

comment:20 in reply to: ↑ 19 Changed 3 years ago by pedberg

Replying to kent.karlsson14@…:

But תשע״ד does not say "5774", it says "774", and roundtrips just fine as that.

Yes, that is a special hack in ICU date formatting and parsing for he@calendar=hebrew: for the current millennium (5000) only, the thousands part is dropped, that is a Hebrew convention. This is handled in ICU and should not be part of the rules in CLDR.

Last edited 3 years ago by pedberg (previous) (diff)

comment:21 Changed 3 years ago by pedberg

  • Status changed from reviewing to accepted

comment:22 Changed 3 years ago by grhoten

After a little more integration testing with ICU, I suspect that there are 2 separate problems.

1) Formatting compatibility with previous results.
2) Parsing is a problem with multiple variants of %hebrew.

I backed out the change for now. We need a separate ticket to reintegrate this.

comment:23 Changed 3 years ago by pedberg

From the review there still seems to be a net change involving the deletion of the following rules:

    <rbnfrule value="200">=%hebrew=[׳];</rbnfrule> 
    <rbnfrule value="300">=%hebrew=[׳];</rbnfrule> 
    <rbnfrule value="400">=%hebrew=;</rbnfrule> 

If there were no net change we would not necessarily need a new bug, we should just move this one along. But I will run the ICU tests with the rules as they are now.

comment:24 Changed 3 years ago by pedberg

Well, 10945 does pass the ICU tests (C and J). That's good.

comment:25 Changed 3 years ago by grhoten

The results are the same. It’s faster to parse the rules and input string too. The net changes came from a previous revision that I was trying out.

I wouldn’t worry about the current difference.

comment:26 Changed 3 years ago by pedberg

  • Status changed from accepted to closed
  • Xref set to 7888
  • Resolution set to fixed
  • Summary changed from Hebrew item and page numbers (RBNF) to Hebrew item and page numbers (RBNF) - initial tests, mostly rolled back out

Thanks. I have spun off cldrbug 7888: to continue this work in a future version of CLDR.

View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.