[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #11255(new survey)

Opened 9 days ago

Last modified 4 days ago

Measurement spacing fixes in LDML file to upload?

Reported by: Marcel Schneider <charupdate@…> Owned by: anybody
Component: survey Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description

Hundreds of wrong spacing (mostly using U+0020), even in English, is to be corrected.

In ST this is inefficient, and leads to exceed the deadline, as the job is done by a single person and must be completed for consistency.

Leaving breakable spaces is unacceptable between numbers and measurement units.

https://academia.stackexchange.com/questions/54885/should-there-be-a-space-between-a-value-and-the-units-used

And many many other pages throughout the internet.

See also forum post http://st.unicode.org/cldr-apps/v#forum/fr//28281

I’m asking for permission to do edits in LDML and upload for consideration by TC and vetters.

Further I’m asking whether I may exceed the deadline, as July 10 proves granting insufficient submission time, given that I’m alone fixing the spaces.

Attachments

Change History

comment:1 follow-up: ↓ 5 Changed 7 days ago by doug@…

even in English

I object to the premise of applying this change to English. Except for specialized scientific and academic typesetting, which is outside the scope of CLDR, it is not considered a problem in English text to break lines between a numeric value and a unit of measure. Inserting NNBSP or NBSP into these formats would redefine non-specialist English usage instead of reflecting current accepted usage.

comment:2 in reply to: ↑ description ; follow-ups: ↓ 3 ↓ 4 Changed 6 days ago by srl

Replying to Marcel Schneider <charupdate@…>:

I’m asking for permission to do edits in LDML and upload for consideration by TC and vetters.

You can always do this yourself. You can prepare offline LDML and upload it as your vote: http://cldr.unicode.org/index/survey-tool/upload

I object to the premise of applying this change to English

I would tend to agree with Doug about not applying this to non-specialist English.

comment:3 in reply to: ↑ 2 ; follow-up: ↓ 9 Changed 6 days ago by Marcel Schneider <charupdate@…>

Replying to doug@…:

even in English

I object to the premise of applying this change to English. Except for specialized scientific and academic typesetting, which is outside the scope of CLDR, it is not considered a problem in English text to break lines between a numeric value and a unit of measure. Inserting NNBSP or NBSP into these formats would redefine non-specialist English usage instead of reflecting current accepted usage.

Then I’ve missed the point about CLDR. So far as I understood the topic, CLDR is not a database for input checking, such as a spellcheck library or a grammar checking tool. CLDR defines how user interfaces are displayed to the sight of the end-user. It’s almost the same challenge as in academic publishing, where linebreaks across measures (i.e. between the measurement number and unit) reflect badly on the publisher (and editor and author). During this survey I was always thinking that software vendors like Microsoft, Apple and Google respond to the same quality standards and wouldn’t like to have their UIs display poorly.

Also I sometimes asked what is official policy at Apple, Google, Microsoft, may be in other contexts. User interfaces do not fall under “non-specialist English usage.” Unless they are set up to imitate informal writing in an attempt to make the user feel comfortable, although I don’t know whether end-users may actually respond to that appeal.

Replying to srl:
[…]

I object to the premise of applying this change to English

I would tend to agree with Doug about not applying this to non-specialist English.

Perhaps I’m biased because in French we have a somehow complex relationship to some non-specialist usage, because the discrepancy between the current non-standard keyboard (by lack of completing standardization for a number of very complicated reasons, not by lack of goodwill on the side of the French people and specialists) and the rules of orthography and typesetting, so that a text typed into a computer following “current accepted usage” may end up looking ugly due to spell and layout issues. So we need anyway new keyboard layouts including support for NNBSP, for everybody’s use.

In this (locale) context I don’t believe that UIs should do anything worse. And from this I extrapolate to English as using the same script and having roughly the same cultural background. I sincerely believed that there are some layout conventions that are in current use in our civilization, and that serious vendors have no interest in falling deliberately out of these standards.

comment:4 in reply to: ↑ 2 Changed 6 days ago by Marcel Schneider <charupdate@…>

Replying to srl:

Replying to Marcel Schneider <charupdate@…>:

I’m asking for permission to do edits in LDML and upload for consideration by TC and vetters.

You can always do this yourself. You can prepare offline LDML and upload it as your vote: http://cldr.unicode.org/index/survey-tool/upload

Thank you, I didn’t know that, and submitted in ticket:11258 but as ST was up again, I could complete the section therein.
I must confess however that editing in LDML wouldn’t have been possible for me without having first gathered the overview in Survey Tool. And many tasks and sections are easier completed in ST.
Unfortunately the deadline is now passed, otherwise I would consider voting annotations this way. However there is a significant drawback, as /annotations/ contain code points as literals, not scalar values, and these don’t show up in ST neither (but on <http://unicode.org/repos/cldr-aux/charts/33/annotations/germanic.html> 10206).

comment:5 in reply to: ↑ 1 Changed 6 days ago by Marcel Schneider <charupdate@…>

Replying to doug@…:
[…]

it is not considered a problem in English text to break lines between a numeric value and a unit of measure. Inserting NNBSP or NBSP into these formats would redefine non-specialist English usage instead of reflecting current accepted usage.

My assumption that CLDR currently supports high-quality typesetting is also fueled by its occasional use of letter apostrophe (in 4 instances in /main/en.xml: "Gwichʼin", "Metaʼ", "Kʼicheʼ", "Gwichʼin") and its featuring curly quotation marks for English. In this context, not using non-breakable spaces where required in high-quality typesetting has less of a design choice and more of a mistake due to inadvertance under the influence of outdated legacy keyboard layouts—however Apple’s US and French keyboard layouts (and perhaps all other locale’s) already have NBSP (on Option+Space and Shift+Option+Space)—and because of ST not clearly displaying most common confusables.

comment:6 Changed 6 days ago by Marcel Schneider <charupdate@…>

Is asking Apple, Google and Microsoft—and the other CLDR users—about their preferences on typography and UI display more challenging than asking a government agency about how to spell the name of a language spoken on their territory? Yet I suspect that first answers could be biased out of an abundance of caution not to get into trouble with existing data and workforce using it uncritically—a bit like when an agency hesitates to follow up after being informed of what’s at stake.

comment:7 Changed 6 days ago by Marcel Schneider <charupdate@…>

FYI: Part of this ticket has now been redacted for publication on the forum as http://st.unicode.org/cldr-apps/v#forum/fr//29137.

comment:8 Changed 6 days ago by Marcel Schneider <charupdate@…>

A reason why I distrust the Approved Data in CLDR is documented in this forum thread (where I’ve posted the above-mentioned content):

http://st.unicode.org/cldr-apps/v#forum/fr//28313

Usage of SP vs NBSP is inconsistent. Guidelines for generating narrow-displayName values (no space) are partly applied, but partly they are not applied, mainly because too few submitters are under Coverage:Comprehensive.

These details are posted here not for blame, but for you to understand that CLDR data is poorly set up and needs to be corrected, ideally first in English under impulsion of TC. Actually software vendors are still poorly served. By advocating to better serve CLDR users, we simultaneously try to get correct French typesetting, notably correct punctuation spacing, into UI display, so that end-users will be no longer encouraged to mess up their own production.

comment:9 in reply to: ↑ 3 ; follow-up: ↓ 10 Changed 4 days ago by doug@…

Replying to Marcel Schneider <charupdate@…>:

During this survey I was always thinking that software vendors like Microsoft, Apple and Google respond to the same quality standards and wouldn’t like to have their UIs display poorly.

This is the assertion that needs to be investigated and not taken on faith: that a line break between a numeric value and a unit of measure is perceived as "poor display" in English contexts.

comment:10 in reply to: ↑ 9 Changed 4 days ago by Marcel Schneider <charupdate@…>

Replying to doug@…:
[…]

This is the assertion that needs to be investigated and not taken on faith: that a line break between a numeric value and a unit of measure is perceived as "poor display" in English contexts.

For academics it is unacceptable, given the effort made to avoid these breaks. Since academic communities are a significant subset of end-users, I don’t believe that disregarding their standards is good policy.

While I’m not in the best place to get feedback from software vendors (and those who are have got the hint), I guess that if aware of the issue, vendors love to align UI display on academic standards. However I further guess that they refrain from heavily impressing upon their workforce to make this happen by adding various no-break spaces, perhaps not so much because the issue is still outside their viewfinder, or because they want to stay cool, than rather because the issue is fixed since a long time in a quite different manner: leave always enough space in UIs for measurement units to display on a single line.

While there is no point in taking a sledgehammer to crack nuts. I stay advocating academic typesetting, which BTW might be flagged “academic” only because access to the means is uncommon as long as certain legacy keyboard layouts stay in use; as long as they do, seeing measurement units broken across lines is so common (outside of UIs) it isn’t considered poor display.

Back to CLDR: Setting up correct usage of space characters in the database is so easy I see no technical reason not to do it. However I see that nearly all changes based on feedback (this and other) are very very slow, although many of them are technically lightweight. For me it’s a question of following through. By lack of anything better I can still refer to the relevant tickets (that are public, unlike the CLDR survey fora) when writing up end-user documentation. I must try not to get impacted by sluggish processes, as I need to focus on other top priorities.

Thank you for carving out the point, and for getting this out.

View

Add a comment

Modify Ticket

Action
as new
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.