[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #11035(accepted)

Opened 8 months ago

Last modified 10 days ago

quotationStart for fr

Reported by: c960657-unicode.org@… Owned by: meikeh
Component: fix-in-survey-tool Data Locale: fr
Phase: dsub Review: mellie
Weeks: Data Xpath:


According to http://jkorpela.fi/html/french.html#spacing, French typography uses a narrow no-breaking spacing inside quotation marks, e.g. « foo » (note: the page contains a broken link to a document by Microsoft. That page has moved to https://docs.microsoft.com/en-us/typography/develop/character-design-standards/punctuation#guillemets).

quotationStart and quotationEnd for fr (and fr-CH) do not reflect this.

Note that according to Wikipedia, spaces are not used in Switzerland:


Change History

comment:1 follow-up: ↓ 3 Changed 5 months ago by mellie

  • Owner changed from anybody to SurveyTool
  • Priority changed from assess to medium
  • Status changed from new to accepted
  • Review set to mellie
  • Milestone changed from UNSCH to 34

comment:2 Changed 5 months ago by verdyp@…

Narrow non-breaking spaces are used as well with other punctuations, notably before ":", ";", "!", and "?"... as well as on the two sides of en-dashes for noting ranges where we want a clear distinction from regular hyphens between words of compound words (the notations using an ellipsis is also used but less common)...
These narrow non-breaking spaces are also existing in English, but the typographic conventions of English uses narrower NNBSP (1/8 em where the French NNBSP is aout 1/4em to 1/6em: for this reason the punctuations in most numeric fonts already include that NNBSP in their advance width and it can be suppressed without being encoded in English, but the result in French is that if we use a regular NBSP or SPACE, the spacing is much too large, so we need the NNBSP explicitly.)

Typically then the encoded NNBSP uses the French metrics, and is used specifically for French typography, and not used in English. But there exists French fonts that do not include the English NNBSP at all in the metrics of the punctuation, and the NNBSP in that case will be about 1/4em; fonts with English typographical metrics of punctuations will use NNSP about 1/6em or smaller.

In all cases, the NBSP or SPACE (1/2em) should not be used in French, except as fallbacks for NNBSP in some old renderers used with fonts that do not map NNBSP (the fallback can be synthetized by modern renderers without needing to use updated fonts without the missing mapping): all unmpapped whitespaces should be synthetized (no "tofu" should appear).

comment:3 in reply to: ↑ 1 Changed 5 months ago by Marcel Schneider <charupdate@…>

Replying to mellie:
This cannot be fixed in ST, because ST is unable to handle proper spec of quotation marks. A test routine prevents ST from accepting the French spaced punctuations (punctuation marks with associated space).

Just tried to input "« " (U+00AB U+202F) in:
Here’s the error message displayed:

[stop]Invalid delimiter. See https://sites.google.com/site/cldr/translation/characters for a list of valid delimiters.

[stop]Expected no more than 1 character(s), but was 2.

Please disable this test in ST so that proper data can be submitted now.

comment:4 Changed 5 months ago by Marcel Schneider <charupdate@…>

Please see Mellie’s forum thread http://st.unicode.org/cldr-apps/v#forum/fr//26054 for discussion of the core issue.

comment:5 Changed 5 months ago by Marcel Schneider <charupdate@…>

From that, one might infer that handling quotation marks in CLDR by giving them a delimiter status is less effective than replicating the scheme used for ellipsis and almost everywhere else in similar circumstances, i.e. using patterns.

For open quotes we’d then have things like “{0} or « {0} or «{0} or »{0}

Moreover, CLDR might wish to account for competing stylistic options, as we have in German, that uses a current quotation mark system („{0} then {0}“, nested: ‚{0} then {0}‘) on one hand, and a book-style quotation mark system (»{0} then {0}«, nested: ›{0} then {0}‹) on the other hand.

Furthermore, CLDR needs to add fields for third level nesting.

Last but not least, one pair of quotation marks is used in discussions of scientific terminology.

Please note, too, that the above is about regular quotation marks. Beside, there are special conventions for typewritten text, due to the drastic limitations in available glyphs on typewriters, and accordingly in available characters on derived legacy computer keyboards and associated keyboard layouts.

The takeaway is that the four fields in CLDR are way from sufficient. The ~/Miscellaneous/Linguistic Elements/Quotation Marks section in CLDR needs to be overhauled.

Please take this comment as a feature request. Thanks for bringing up and looking into this topic.

comment:6 Changed 5 months ago by Marcel Schneider <charupdate@…>

Forgot the irony quotes, that some locales have resources to disambiguate from quotation quotes which many users dislike being ‘abused’ as irony quotes, e.g. in French (see comments in https://www.ledevoir.com/societe/actualites-en-societe/488139/mises-aux-points-les-antiguillemets-comme-symboles-de-la-postverite). Typically quotations are then bracketed with «  », and scare quotes are then English quotation marks, consistently with the origin of the scheme, invented by Elizabeth Anscombe in 1956 (https://en.wikipedia.org/wiki/Scare_quotes).

Please note, too, that some French people dislike single guillemets ‹  › and prefer mixing up French style and English style quotation marks for nested quotations, which IMO is inappropriate. fr-CH uses «» and ‹› without any problem, and I suspect issues in fr-FR might be related to not having ‹› in the typecase, so that the “most considered” (scare quotes) style guide even recommends repeating first-level quotes for second-level, and to unpair first-level or second-level quotes when a first-level quotation and a second-level quotation end simultaneously. Obviously that is not current practice at all, and only contributes to discredit the style manual.

I hope it becomes clear from all that how the quotation marks section of CLDR should be redesigned to account for all related locale settings.

comment:7 Changed 5 months ago by Marcel Schneider <charupdate@…>

Wondering what CLDR quotation marks are actually used for, I infer from Philippe Verdy’s point that CLDR might wish to host settings for all punctuation marks whose usage is locale-dependent.

Featuring this content would require a new section titled "Other Punctuation", containing colon, semicolon, question and exclamation marks, and probably strong (period, danda) and weak (comma) phrase separators. Including punctuation for parenthetical set-offs (paired, like brackets, and unpaired, like dashes) would equally be useful I guess. Although the latter would split up en-US (according to style manuals) or require several alternate patterns.

comment:8 Changed 5 weeks ago by pedberg

  • Milestone changed from 34 to upcoming

CLDR 34 BRS closing item, move all open 34 → upcoming

comment:9 Changed 3 weeks ago by mark

  • Owner changed from SurveyTool to kristi
  • Component changed from other to survey-submit

comment:10 Changed 3 weeks ago by mark

  • Milestone changed from upcoming to 35-optional

comment:11 Changed 10 days ago by kristi

  • Owner changed from kristi to meikeh

This was initially assigned to Mellie. Was this done in v34? Please reassess the milestone


Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.