CLDR Ticket #8361(closed: fixed)

Opened 4 years ago

Last modified 4 years ago

Finnish RBNF - add missing word inflections

Reported by: pedberg
Component: numbers-rbnf Data Locale: fi
Phase: rc Review: mark
Weeks: Data Xpath:



We have gotten some feedback on missing word inflections in Finnish RBNF; updated Finnish RBNF rules attached in a format similar to ICU format.


fi.txt (32.3 KB) - added by pedberg 4 years ago.

Change History

Changed 4 years ago by pedberg

comment:1 Changed 4 years ago by kent.karlsson14@…

This is going terribly overboard, and is not helpful. Adding all(?) theoretically possible variants is not a good idea, and is a nightmare to vet. Only those variants that are actually useful in practice, i.e. sufficiently common (for some value of "sufficient"), should be added to CLDR/RBNF/fi. Rarely used (for some value of "rare") variants should just be skipped for CLDR.

Please provide a use case analysis that can be used to judge which variants should be added. Adding 44 variants, most of which are useless in practice, should not be done.

comment:2 Changed 4 years ago by kent.karlsson14@…

There is an architectural issue, regarding ruleset naming, in the submitted suggestion. When there are variants (more than one in a particular dimension), all variants are given the variant name in the name of the ruleset.

I.e., here:
cardinal -> cardinal-nominative
ordinal -> ordinal-nominative

comment:3 Changed 4 years ago by grhoten

  • Owner changed from emmons to grhoten
  • Status changed from new to reviewing
  • type set to data
  • Review set to erkki
  • Milestone changed from UNSCH to 28

Finnish is a highly inflectional language, and this is not theoretical. RBNF is used in pronouncing numbers, which can be found in dates, times, ranges of dates/times, counting units (kilometers and such), and a whole bunch of stuff like that. Erkki previously mentioned that there are are large number of grammatical cases used and needed in Finnish.

You may also be interested in this paper http://web.stanford.edu/~laurik/publications/NumbersNumerals.pdf

I met Lauri Karttunen (the author) once. He also agreed that if you want to count or order something, and you need to correctly pronounce it in a sentence, you need all those forms.

Like the other RBNF rules for other languages, the singular-nominative form is assumed unless specified otherwise. A lot of other RBNF rules already follow this model for the cardinal and ordinal forms.

This work looks like a continuation of cldrbug:5363.

comment:4 Changed 4 years ago by grhoten

  • Xref set to 5363

comment:5 Changed 4 years ago by markus

  • Component changed from data-other to rbnf

comment:6 Changed 4 years ago by kent.karlsson14@…

The paper you refer to does not give any list.

You may find https://www.cs.tut.fi/~jkorpela/finnish-cases.html interesting, including the list with usage frequency. Not at all close to 44 variants, and the last few are so rare that one should ignore them for RBNF. That would leave somewhere between 10 and 20 variants...

I seriously doubt that the "average native language user" ever uses anywhere near to 44 variants, or even understand many of them.

comment:7 Changed 4 years ago by grhoten

Note that the provided rules misspelled genitive as genetive. That should be fixed.

comment:8 Changed 4 years ago by grhoten

  • Review changed from erkki to emmons

I just fixed the misspelling, but it seems that erkki is no longer active, so I'm assigning to a new reviewer.

comment:9 Changed 4 years ago by mark

  • Review changed from emmons to mark

comment:10 Changed 4 years ago by mark

  • Status changed from reviewing to closed
  • Resolution set to fixed

