CLDR Ticket #7833(accepted data)
Redundant plural rules for Tagalog and Filipino ?
Reported by: | verdy_p@… | Owned by: | mark |
---|---|---|---|
Component: | plurals | Data Locale: | tl, fil |
Phase: | dsub | Review: | |
Weeks: | Data Xpath: | ||
Xref: |
Description
The current plural rules for Tagalo and Filipino are unnecessarily redundant:
one:
v = 0 and i = 1,2,3 or
v = 0 and i % 10 != 4,6,9 or
v != 0 and f % 10 != 4,6,9
The condition alternative (for integers 1;2;3) will be true when the second alternative is also true (integers whose unit digits is not 4;6;9).
LDML specifications says that "plural rules must be mutually exclusive" (to be "self-contained and not depend on the ordering" of the syntax), this is not true for the two first alternatives.
The first alternative is not necessary at all! The rules are equivalent to
one:
v = 0 and i % 10 != 4,6,9 or
v != 0 and f % 10 != 4,6,9;
Now the distinction is only if there are visible fractions, in which case the last visible digit of the fraction is used instead of the unit. In summary, all that matters is the last digit displayed either in the unit, or in fractions and it is singular (one) for digits 0,1,2,3;5,7,8 and plural (other) is used when that last digit is 4,6,9;
1.4 is plural (last digit is 4)
1.40 is singular (last digit is 0)
May be we could have a type of integer operand "u" containing the value of the last displayed component (the integer part when there's no fraction
displayed : operand
"0" : u=0 (singular)
"0.00" : u=0 (singular)
"0.5" : u=5 (singular)
"0.5004" : u=5004 (plural)
"1" : u=1 (singular)
"4" : u=4 (plural)
"1.4" : u=4 (plural)
"1.40" : u=40 (singular)
In which case the Tagalog/Filipino rule reduces to a single condition (completed below by samples):
one:
u % 10 != 4, 6, 9
@integer 0~3, 5, 7, 8, 10~13, 15, 17, 18, 20, 21, 100, 1000, 10000, 100000, 1000000, ...
@decimal 0.0~0.3, 0.5, 0.7, 0.8, 1.0~1.3, 1.5, 1.7, 1.8, 2.0, 2.1, 10.90, 100.01, 1000.002, 10000.0003, 100000.00005, 1000000.000007, ...;
other:
@integer 4, 6, 9, 14, 16, 19, 24, 104, 1006, 10009, 100004, 1000006, ...
@decimal 0.4, 0.6, 0.9, 1.4, 1.6, 1.9, 2.4, 2.6, 10.9, 100.04, 1000.006, 10000.0009, 100000.000004, 1000000.0000006, ...
I suspect this is more complex than that and there's a missing "and" clause for the 1st condition, or if "v=0" in the 1st alternative should have been dropped (to match independantly of the presence of visible fractions).
Attachments
Change History
comment:1 Changed 3 years ago by mark
- Owner changed from anybody to mark
- Status changed from new to assigned
- Milestone changed from UNSCH to 27dvet
comment:3 Changed 3 years ago by Eemeli Aro <eemeli@…>
Other locales for which the symbol u would simplify the rules include at least bs/hr/sh/sr, dsb/hsb, lv/prg, and mk.
That's in fact every use of the symbol f, with the exception of lt's "many", which could equivalently be expressed using the symbol t.
Here are some of the simplifications that could be made with u:
bs/hr/sh/sr now:
"one": "v = 0 and i % 10 = 1 and i % 100 != 11 or f % 10 = 1 and f % 100 != 11" "few": "v = 0 and i % 10 = 2..4 and i % 100 != 12..14 or f % 10 = 2..4 and f % 100 != 12..14"
bs/hr/sh/sr with u:
"one": "u % 10 = 1 and u % 100 != 11" "few": "u % 10 = 2..4 and u % 100 != 12..14"
dsb/hsb now:
"one": "v = 0 and i % 100 = 1 or f % 100 = 1" "two": "v = 0 and i % 100 = 2 or f % 100 = 2" "few": "v = 0 and i % 100 = 3..4 or f % 100 = 3..4"
dsb/hsb with u:
"one": "u % 100 = 1" "two": "u % 100 = 2" "few": "u % 100 = 3..4"
mk now:
"one": "v = 0 and i % 10 = 1 or f % 10 = 1"
mk with u:
"one": "u % 10 = 1"
comment:5 Changed 2 years ago by verdy_p@…
The reason of this "strange" thing that the same value can be singular or plural comes from the way the numbers are spelled orally when there are visible fractions : the integer part has its own singular/plural rule (omitted when digits are written, because the decimal separator is just a invariant symbol), then the decimal separator is pronounced, then the fractional part folloved by the unit that takes its plural rule separately.
In other words, when fractions are written (even if they are just zeroes), these fractions have the pritory.
For this reason, a number displayed as "10" may be plural in some language, when "10.0" could be singular because only "0" is considered ; as well "10.1" and "10.10" would have different plurals (just discard the "10." part, consider "1" and "10").
This has a side effect when numbers can be formatted with a variable precision. In those languages you need to set explicitly the precision for fractions if the word for the unit following it is fixed and does not depend on the value. But ideally the plural rules in CLDR should allow choosing the correct word depending on the evaluation of the formatted number (independantly of its initial internal binary value before formatting it to a string, because formatting can generate roundings).
comment:6 Changed 2 years ago by mark
- Phase changed from dvet to dsub
- Milestone changed from 27 to 28
comment:9 Changed 2 years ago by mark
- Milestone changed from 28 to 29
After data submission, the priority for these drops, so moving to the start of the next cycle.
comment:10 Changed 2 years ago by verdyp@…
This bug was submitted long before the Data submission ans has already
postponed several times. Apparently you don't seem to take the initial
request seriously and when you delay it to another future branch it is
forgotten each time.
Is it so complex to handle?
comment:11 Changed 22 months ago by emmons
- Milestone changed from 29 to upcoming
Auto move of all 29 -> upcoming