[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #8088(accepted data)

Opened 3 years ago

Last modified 2 years ago

Improve Italian Rule-Based Number Formatting rules

Reported by: sascha@… Owned by: grhoten
Component: other Data Locale: it
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description

The current RBNF rules for Italian are sometimes producing wrong output. For example, %spellout-cardinal-feminine for 31 should give trent­un instead of trentuna; in correct Italian, it should be ci sono trentun macchine, not *trentuna macchine.

We have asked an Italian linguist to review and correct the RBNF rules in CLDR. The updated file is attached. Also, please find attached the test data file we're using internally for our unittests. The format is as follows: {rbnf-rule}TAB{number}TAB{formatted-number}

Attachments

it.txt (17.8 KB) - added by sascha@… 3 years ago.
Corrected RBNF rules for Italian
testdata-it.txt (23.6 KB) - added by sascha@… 3 years ago.
Tests for Italian RBNF rules

Change History

Changed 3 years ago by sascha@…

Corrected RBNF rules for Italian

Changed 3 years ago by sascha@…

Tests for Italian RBNF rules

comment:1 follow-up: ↓ 10 Changed 3 years ago by sascha@…

Examples:

%spellout-numbering ; 1 ; uno ; il numero uno
%spellout-cardinal-masculine ; 1 ; un ; un giorno
%spellout-cardinal-feminine ; 1 ; una ; una settimana
%spellout-ordinal-masculine ; 1 ; primo ; il primo giorno del mese
%spellout-ordinal-feminine ; 1 ; prima ; la prima settimana del mese
%spellout-ordinal-masculine-singular ; 1 ; il primo giorno di scuola è bello
%spellout-ordinal-masculine-plural ; 1 ; i primi giorni di scuola sono belli
%spellout-ordinal-feminine-singular ; 1 ; la prima settimana di scuola è bella
%spellout-ordinal-feminine-plural ; 1 ; le prime settimane di scuola sono belle

comment:2 Changed 3 years ago by grhoten

  • Cc grhoten added

We seem to be going back and forth on these spellings. I think we're going to need more formal references to back up this request.

comment:3 follow-ups: ↓ 4 ↓ 8 Changed 3 years ago by spareti@…

Hello,

Changes are of two types:

1) fixing incorrect spellings, e.g. feminine termination for cardinal number does not exist in Italian, the only exception is 1 since it is also an article and therefore it takes the required inflections). Numerals ending in 1 (e.g. 21, 301, 1001) do not have the feminine unless they are separately written and followed by a singular feminine noun (e.g. 101 friends(f,pl) would be 'centoun amiche(f,pl)' or 'cento e uno(m,s) amiche(f,pl)' or (rarely) 'cento e una(det,f,s) amica(f,s)').

2) improving acceptable but less common or correct spellings. There are many debatable cases in this group, in particular the final vowel apocope, the elision of internal vowels when joining complex numerals (e.g. centouno or centuno, centootto or centotto) and whther to use the separate or joined form (cento e due or centodue).

Decision were made based on the following sources (and some additional ones):
Treccani (one of the best encyclopedia in Italy)
http://www.treccani.it/enciclopedia/numerali_%28Enciclopedia_dell%27Italiano%29/
Discussions on special cases from Accademia della Crusca (the most prominent research institution on Italian language and the first linguistic academy), e.g.:
http://www.accademiadellacrusca.it/it/lingua-italiana/consulenza-linguistica/domande-risposte/elisione-troncamento-nellitaliano-contempora
For cases where it was only a matter of stylistic preference I consulted the Web for frequency of occurrence and a pool of 5 Italian linguist from my team.

As already pointed out, some spellings adjustments are required for 1 and numerals ending in 1 (for correctness. Some additional adjustments, e.g. for numerals ending in vowel are just stylistic), depending on the phonology of the noun they modify. However, this cannot be easily handled at this level since ICU is not context aware. We would need compositional rules based on how the following noun is pronounced. For Italian native words this can be approximated using rules based on the word-initial spelling (e.g. 1 "un" should be "un'" if followed by a feminine starting with vowel, "uno" if followed by a noun starting with a consonantic group such as "sc, gn, sc, ...").

Let me know if you have any specific doubts.

comment:4 in reply to: ↑ 3 Changed 3 years ago by grhoten

Replying to spareti@…:

As already pointed out, some spellings adjustments are required for 1 and numerals ending in 1 (for correctness. Some additional adjustments, e.g. for numerals ending in vowel are just stylistic), depending on the phonology of the noun they modify. However, this cannot be easily handled at this level since ICU is not context aware. We would need compositional rules based on how the following noun is pronounced. For Italian native words this can be approximated using rules based on the word-initial spelling (e.g. 1 "un" should be "un'" if followed by a feminine starting with vowel, "uno" if followed by a noun starting with a consonantic group such as "sc, gn, sc, ...").

Maybe you can clarify what you're saying here. We have the masculine and feminine forms in addition to the numbering form. Are we missing some variants? The limitations of ICU should not be taken into account. If word choice depends on the vowel properties of the word, then we should make that additional variant available. So far detection of gender, grammatical number, case, vowel properties and other grammatical properties of words have only been available outside of ICU. Either the translator will know the context of the word, or a more informed framework than ICU will know the context when choosing the correct variant.

comment:5 Changed 2 years ago by markus

  • Type set to data

comment:6 Changed 2 years ago by markus

  • Component changed from data-other to other

comment:7 Changed 2 years ago by grhoten

  • Owner changed from emmons to grhoten
  • Status changed from new to accepted

comment:8 in reply to: ↑ 3 Changed 2 years ago by kent.karlsson14@…

Replying to spareti@…:

2) improving acceptable but less common or correct spellings. There are many debatable cases in this group, in particular the final vowel apocope, the elision of internal vowels when joining complex numerals (e.g. centouno or centuno, centootto or centotto) and whther to use the separate or joined form (cento e due or centodue).

...

Some additional adjustments, e.g. for numerals ending in vowel are just stylistic),

The simpler the better. If the complication of vowel elision is optional, then don't do it. Simpler rules, simpler maintenance, less risk of errors. And in this case, for the "end users" (readers, listeners), more straightforward spelling and clearer (more articulated) pronunciation.

comment:9 Changed 2 years ago by grhoten

@sascha, I got the following feedback on this topic.

If I get it right, this is the only pending point.

I think that in "common language", the only exception is with one.

With all other numbers, including numbers ending with 1, we can use the masculine form ending with "uno":
31 : trentuno
21 : ventuno
11 (exception): undici
101 : centouno or cento uno or cento e uno. or centuno.
1001: milleuno or mille uno or mille e uno

All of the above don't change with cardinality or gender.
I'm not sure which form is to be preferred.
I don't think that separating the forms "trentuno" and "trentun" based on the first following letter, is so important (the above are correct, in both cases).
But, it is important for 1:

1, even keeping the semantic meaning of the number, behaves as the indefinite article.
This means it can be
uno - masculine + consonant
una - feminine + consonant
un - masculine + vowel
un' - feminine + vowel (this one collapses the space before the noun)

After reviewing the rules, there are a few problems.

  1. You removed the [optional] notation. This is bad. It will force a zero there when one is not desired.
  2. You denormalized the rule order by putting it in a different order. This makes it hard to compare what changes you're proposing.
  3. There is no need for the redundant singular variants. It makes it harder to vet. Other languages don't add them. The singular form is usually assumed, and plural is added as an exceptional case.

Adding the plural forms is not an issue if you have a need for it.

I'm also fine with adding variants for consonantic groups or vowel elisions, but your proposal doesn't have those.

For the stylistic differences, I'm not very convinced that it's important to change those spellings. Unfortunately, it's hard to determine which are stylistic changes or real spelling errors. I've seen some Italians from different regions of Italy have this discussion. They have a preference, but they recognize the other forms when it's a stylistic difference.

Since the brackets were missing in your submitted rules, I definitely recommend using the Number Format Tester and using that with your Italian linguists to explore how the numbers really appear.

comment:10 in reply to: ↑ 1 ; follow-up: ↓ 11 Changed 2 years ago by kent.karlsson14@…

Replying to sascha@…:

%spellout-ordinal-masculine-plural ; 1 ; i primi giorni di scuola sono belli

This example is bogus. It does not make sense to say "i secundi giorni di scuola sono belli". (Which days are that?) (In English, you can say "the first few days...", but it does not make sense to say "the second few days...", etc.) "Primi"/"first" is then not used as a count word, it just means "initial".

Plural count words are used in some languages for when what is counted/ordered is delimitable *sets*, like in "the 25th Olympic games". But it does not make sense when there is no clear delimitation of the sets.

comment:11 in reply to: ↑ 10 Changed 2 years ago by spareti@…

Replying to kent.karlsson14@…:

Replying to sascha@…:

%spellout-ordinal-masculine-plural ; 1 ; i primi giorni di scuola sono belli

This example is bogus. It does not make sense to say "i secundi giorni di scuola sono belli". (Which days are that?) (In English, you can say "the first few days...", but it does not make sense to say "the second few days...", etc.) "Primi"/"first" is then not used as a count word, it just means "initial".

Plural count words are used in some languages for when what is counted/ordered is delimitable *sets*, like in "the 25th Olympic games". But it does not make sense when there is no clear delimitation of the sets.

That a good point, it's correct, but artificial. In fact 'primi' and 'secondi' used in that context have a different meaning. 'Primi' means indeed 'first few days', while 'secondi' means 'the second days' (e.g.: all second days of each year), not 'the second few days'. There are some examples that would make more sense here:
'I secondi figli sono meno viziati' EN: 'Second children are less spoiled'
'I primi/secondi arrivati non vincono niente' EN: 'The ones arriving first/second don't win anything'

Other than delimitable sets, plural ordinal are used for words lacking a singular, e.g.:
'I secondi pantaloni che ho provato' EN:'The second pair of trousers I tried on'
'Le terze nozze' EN: 'the third marriage'

View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.