[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #11302(new data)

Opened 3 months ago

Last modified 3 months ago

Ordinal minimal pairs for French

Reported by: Marcel Schneider <charupdate@…> Owned by: anybody
Component: main Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description

The ordinal minimal pairs in French and other languages have actually two issues related to grammar and to digital representation. The actual fields in French don’t allow to input data conformingly to CLDR epecifications, as might be suggested by these items showing up in the dashboard.

Consequently this ticket is filed following instructions found in the guideline:

“When you find errors or omissions in this data, please report the information with a bug report.”

1. Grammar

The CLDR submission guidelines have been enhanced with a new instruction:

“Warning for Vetters
The minimal pairs in the Survey Tool are not direct translations of English. They may be translations of English, such as in German, but must be different if those words or terms do not show the right plural differences for your language. […]”

http://cldr.unicode.org/index/cldr-spec/plural-rules#TOC-Determining-Plural-Categories

The actual data structure with zero, one, two, few, many, other, is derived from the needs of the English language, which happen to fit seemingly almost every locale. French and probably a set of other Latin script using languages fall out of the scheme because they denote short ordinals by appending the ending of the long form in superscript letters after the number. This introduces differences by gender and by number.

To account for a usefully complete set of minimal pairs in French requires extending the DTD with two new attributes: gender and number. Using these, the actual two ordinalMinimalPairs found in /main/fr.xml expand to up to eight pairs. Additionally, the original sample might be streamlined by shortening it to a more generic pattern. That has at least been done for filing this ticket.

2. Digital representation

In some languages including English and French, but perhaps not Italian, Portuguese and Spanish, using ASCII fallbacks instead of superscript letters has become so common that it is currently accepted and has become a new standard. However this representation is not considered pure, and if used in more formal material such as high-quality user interfaces, still has the status of a sloppy or lazy implementation of these languages.

Emulating the superscript letters using superscript formatting is convenient for some usages but is not interoperable, at it uses higher level protocols and is therefore in contradiction to Unicode design principles stipulating that every encoded language must be correctly displayable in plain text. Hence emulating superscript letters in abbreviations like ordinals using formatting at rendering level is not considered a correct digital representation, while using preformatted superscript Latin letters for these abbreviations is to follow the Unicode design principles of accuracy and interoperability.

CLDR features data in plain text without higher level protocols, and must therefore use the preformatted superscript letters wherever regular rendering uses superscript. Depending on the font size, an implementation may convert the superscripts to base letters, but this mapping must not be enforced by default in source data.

English and French are more concerned than Italian, Portuguese and Spanish, because the former use a non-trivial number of superscript letters where the latter mainly use two, that have been encoded in Latin-1 because there was enough code space:

Italian, Portuguese, Spanish: ª, º (plus eventually ˢ, but then also ᵃ and ᵒ)
English: ᵈ, ʰ, ⁿ, ʳ, ˢ, ᵗ (6)
French: ᵈ, ᵉ, ʳ, ˢ (4)

References for French about syntax and display of abbreviated ordinals

http://www.academie-francaise.fr/abreviations-des-adjectifs-numeraux

http://www.langue-fr.net/spip.php?article239

http://www.les-abreviations.com/adjectifs.html

More references will be posted in the next comment because the number of external links is limited to 4 per post.

Proposed data for French

/common/dtd/ldml.dtd

Actual:

<!ELEMENT ordinalMinimalPairs ( #PCDATA ) >
<!ATTLIST ordinalMinimalPairs ordinal NMTOKEN #IMPLIED >
<!ATTLIST ordinalMinimalPairs alt NMTOKENS #IMPLIED >
<!ATTLIST ordinalMinimalPairs draft (approved | contributed | provisional | unconfirmed) #IMPLIED >

Proposed:

<!ELEMENT ordinalMinimalPairs ( #PCDATA ) >
<!ATTLIST ordinalMinimalPairs ordinal NMTOKEN #IMPLIED >
<!ATTLIST ordinalMinimalPairs gender NMTOKEN #IMPLIED >
<!ATTLIST ordinalMinimalPairs number NMTOKEN #IMPLIED >
<!ATTLIST ordinalMinimalPairs alt NMTOKENS #IMPLIED >
<!ATTLIST ordinalMinimalPairs draft (approved | contributed | provisional | unconfirmed) #IMPLIED >

/common/main/fr.xml

Actual:

<minimalPairs>

<pluralMinimalPairs count="one">{0} jour</pluralMinimalPairs>
<pluralMinimalPairs count="other">{0} jours</pluralMinimalPairs>
<ordinalMinimalPairs ordinal="one">Prenez la {0}re à droite.</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="other">Prenez la {0}e à droite.</ordinalMinimalPairs>

</minimalPairs>

Proposed:

<minimalPairs>

<pluralMinimalPairs count="one">{0} jour</pluralMinimalPairs>
<pluralMinimalPairs count="other">{0} jours</pluralMinimalPairs>
<ordinalMinimalPairs ordinal="one" gender="feminine" number="singular">la {0}ʳᵉ</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="one" gender="feminine" number="plural">les {0}ʳᵉˢ</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="one" gender="masculine" number="singular">le {0}ᵉʳ</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="one" gender="masculine" number="plural">les {0}ᵉʳˢ</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="other" gender="feminine" number="singular">la {0}ᵉ</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="other" gender="feminine" number="plural">les {0}ᵉˢ</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="other" gender="masculine" number="singular">le {0}ᵉ</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="other" gender="masculine" number="plural">les {0}ᵉˢ</ordinalMinimalPairs>

</minimalPairs>

The last 4 elements could be collapsed to:

<ordinalMinimalPairs ordinal="other" gender="feminine" number="singular">la {0}ᵉ</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="other" gender="masculine" number="singular">le {0}ᵉ</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="other" number="plural">les {0}ᵉˢ</ordinalMinimalPairs>

If the article is omitted, that would collapse to:

<ordinalMinimalPairs ordinal="other" number="singular">{0}ᵉ</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="other" number="plural">{0}ᵉˢ</ordinalMinimalPairs>

Attachments

Change History

comment:1 Changed 3 months ago by Marcel Schneider <charupdate@…>

comment:1 Changed 3 months ago by Marcel Schneider <charupdate@…>

comment:2 Changed 3 months ago by Marcel Schneider <charupdate@…>

Duplicate posting above is unintentional (due to a bug, see same comment number).

View

Add a comment

Modify Ticket

Action
as new
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.