[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #11272(new survey)

Opened 5 weeks ago

Last modified 3 days ago

Accurate digital representation of French boycotted though Unicode-conformant

Reported by: Marcel Schneider <charupdate@…> Owned by: anybody
Component: main Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:


I’ve ended up supposing that industrial contributors-vetters for the fr-FR locale are directed to boycott the most Unicode-conformant representation of French, that uses NNBSP.

Unicode-conformant fr-FR French

See TUS 10.0, page 269 https://www.unicode.org/versions/Unicode10.0.0/ch06.pdf#G17097:
«Narrow No-Break Space. U+202F narrow no-break space (NNBSP) is a narrow version of U+00A0 no-break space, which except for its display width behaves exactly the same in its line breaking behavior. It is regularly used in Mongolian in certain grammatical contexts (before a particle), where it also influences the shaping of the glyphs for the particle. In Mongolian text, the NNBSP is typically displayed with 1/3 the width of a normal space character. The NNBSP can be used to represent the narrow space occurring around punctuation characters in French typography, which is called an “espace fine insécable.”»

TUS 11.0, p. 265, is even slightly blunter (there is no more direct bookmark): «Narrow No-Break Space. U+202F narrow no-break space (NNBSP) is a narrow version of U+00A0 no-break space. The NNBSP can be used to represent the narrow space occurring around punctuation characters in French typography, which is called an “espace fine insécable.” It is used especially in Mongolian text to provide a small, nonbreaking gap before certain grammatical suffixes, and may trigger special shaping for those suffixes. See “Narrow No-Break Space” in Section 13.5, Mongolian, for more information.»

Conformant representation flawed

Note the “can be used to represent”, instead of “is used to represent” which would be more appropriate as it is widely used and is the Unicode way of implementing French typography. In a specification, the verb "is/are" has semantics of "should be" rather than "is/are always" given in word processors it is not. And that seems to be what the industry is eager to maintain, at most under specious pretexts and flimsy allegations, but ordinarily with no disclosable rationale (ticket:11235#comment:22 and ticket:11235#comment:23). The clue is that vendors accepting accurate digital representation of French in CLDR and thus in UIs would raise expectations to do the same in word processors, empowering everybody to type regular French, something that the general public is/was deliberately precluded from doing, and that was straightforward only in expensive DTP software, or using quirks like U+2009 U+FEFF, or later U+2009 U+2060 if supported (still not in MS Word), to emulate U+202F used nowadays.

Now part of the industry seems to lobby against spreading the use of U+202F to the general public, so that the French people and all the Francophones still don’t actually come into the benefit of a straightforward digital representation of their language at everybody’s reach, an unprecedented nastiness after 26 years of Unicode. No DTP vendor has the right to deliberately mess up any language’s representation. Nevertheless some vendor(s) are ripping off the French language for their marketing purposes, taking users hostage by messing with their written expression unless they end up subscribing.

CLDR vetting disturbed

On CLDR, to cover the tracks, the watchword seems to be to freeze the votes. For example, the localeKeyTypePattern has been accepted with NNBSP in the wake of an early discussion (v34 survey, http://st.unicode.org/cldr-apps/v#forum/fr//25350 "[v34] 2018-06-10 03:34"), then supposedly the conspiracy was alarmed, and a secret instruction was spread not to vote any more NNBSPs, starting with the three following items despite forum post http://st.unicode.org/cldr-apps/v#forum/fr//27975 ("[v34] 2018-07-05 16:55"). Meanwhile, the already voted item is not unvoted, obviously in order that I should not be able to guess that there is a secret instruction, but assume that everybody is on holidays and will catch up on July 23–25.

Probably with the same intent, votes like the one about script name "Gothic" ("gotique" in French) discussed in a thread reopened at http://st.unicode.org/cldr-apps/v#forum/fr//27376 ("[v34] 2018-07-02 01:18") remain unperformed by the industrial vetters, against better knowledge, even while the issue is known since at least 16 days (actually since over two years, see "[v30] 2016-06-15 08:24"), and votes will close in 7 days. The suspected reason of that inactivity is that fixing everything except NNBSP would be a clear proof of focused boycott. So to make me unable to deliver a proof that everybody will be asking for, vetters are directed to make a mess of French CLDR by not helping any more fix the mess it still is. I say “any more” because most of the vetters have been cooperative for a while, supposedly (again) until they received secret instructions from their organization or from Unicode not to vote any more items and not to post to the forum any more. These instructions for unlawful behavior against the CLDR guideline of open discussion on survey fora fit into a criminal operating mode, reflecting badly on any organization involved in that conspiracy.


Change History

comment:1 Changed 3 days ago by Marcel Schneider <charupdate@…>

Update: This report is now superseded

The above is entirely superseded since all vetters voted unanimously for patterns using NNBSP in Locale Field Fallbacks, see http://st.unicode.org/cldr-apps/v#/fr/Locale_Name_Patterns/797e02e84e5c2d4f and following.

The group separator too has been set to NNBSP by TC approval, after most industrial vetters, namedly Apple and Google, upvoted NNBSP, see http://st.unicode.org/cldr-apps/v#/fr/Symbols/a1ef41eaeb6982d

Measurement units spacing too has been set to regular NNBSP with abbreviations, and NBSP with full names, thanks to votes of Google followed by Apple, see eg http://st.unicode.org/cldr-apps/v#/fr/Length/33a870a1b5965999 and surrounding items and sections.

As of the name of the Gothic language, Apple joined in voting for correct spelling, so that the issue was fixed, see http://st.unicode.org/cldr-apps/v#/fr/Languages_E_J/55ba65aec648bde0 (Gothic script name too is correctly spelled in French).


Thanks to everyone, vetters and TC member, who voted correct values, and to the CLDR TC extending the vetting deadline by 5 days (subsequently to fixing bug 11293).


This ticket may be closed. Thanks.


Add a comment

Modify Ticket

as new

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.