[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #11046(new data)

Opened 13 days ago

Last modified 9 days ago

Investigate data needed for number range formatting

Reported by: shane Owned by: anybody
Component: numbers Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description (last modified by shane) (diff)

CLDR already has one field for range formatting data:

​https://www.unicode.org/cldr/charts/latest/by_type/numbers.number_formatting_patterns.html#Miscellaneous_Patterns_

However, that data point does not answer questions such as the following:

  1. When formatting currency ranges, do you show the currency sign once ($3-5) or twice ($3-$5)?
  2. How do ranges work for negative numbers, like -3 to -5?
  3. How about data for word ranges, like "3 to 5" instead of "3-5"?
  4. How are scientific notation, compact notation, and measure units affected? (1E2-1E2 or 1-2E2? 1K-3K or 1-3K? 1 foot-3 feet, 1 foot - 3 feet, or 1-3 feet?)
  5. What happens when the range bounds become the same after rounding? This could happen if you have integer rounding and ask for the range 4.9-5.1, for example. ICU should not give an API that prints "$5-5" as the default behavior; we will need some sensible fallback, perhaps an "approximate" pattern like "~$5".

These questions should be answered in order to build a robust number range formatter in ICU.

Attachments

Change History

comment:1 Changed 13 days ago by shane

  • Description modified (diff)

comment:2 Changed 10 days ago by shane

  • Description modified (diff)

comment:3 Changed 10 days ago by mark

Those are good questions: I'll add some comments. However, what we should also do to get a sense of the variation is see what various style guides have, both for publications (The Economist) and academic (CMoS).

  1. When formatting currency ranges, do you show the currency sign once ($3-5) or twice ($3-$5)?
    • I suspect a good default for currencies and units is to only have the currency (also unit) once. We could address that either by adding boolean flags duplicateUnitInRange, duplicateCurrencyInRange, or additional patterns unitRange and currencyRange, like "{2}{0}{1}". However, I suspect that the range pattern for English with currencies or units might need spaces around the en-dash. Eg 1 – 5kg, rather than 1–5kg.
  2. How do ranges work for negative numbers, like -3 to -5?
    • I don't think we care too much what these are formatted as, because a range will be very infrequent (I don't have a particularly good intuition as to what they should be formatted like in English! "-50 – -40kg"? "-50..-40kg"?. Might be better to fall back to including the unit/currency. We could add an additional pattern, but I think it is a lower priority.
  3. How about data for word ranges, like "3 to 5" instead of "3-5"?
    • Those get really ugly, since they often require inflections. Probably best to avoid.
  4. How are scientific notation, compact notation, and measure units affected? (1E2-1E2 or 1-2E2? 1K-3K or 1-3K? 1 foot-3 feet, 1 foot - 3 feet, or 1-3 feet?)
    • I don't think we care about "programmer scientific notation". I suspect real scientific notation would be ok to just use the range. Compact numbers are a problem, since 1–5K could mean 1–5000 or 1000–5000. Units I think should work like currencies.
  5. What happens when the range bounds become the same after rounding? This could happen if you have integer rounding and ask for the range 4.9-5.1, for example. ICU should not give an API that prints "$5-5" as the default behavior; we will need some sensible fallback, perhaps an "approximate" pattern like "~$5".
    • Very interesting case. I think your suggestion of an "approximate" pattern might be the best answer.

comment:4 Changed 9 days ago by shane

On each point:

  1. Are we sure all locales are going to prefer one currency sign instead of two? Is there no variation between locales? I would be more comfortable hearing from some l10n experts.
  2. Agreed that this is lower priority for now.
  3. I think we should eventually get this data added to CLDR. So many other items have both the "narrow" form and the "long" form. But it might not be required for the first version of this feature.
  4. Like with 1, I would like to see feedback from l10n experts.
  5. Cool; how do we go about acquiring the data for the approximate pattern? Can we get it with the summer survey tool cycle?

comment:5 Changed 9 days ago by shane

  • Cc mark added

Adding mark as CC; see my replies above.

View

Add a comment

Modify Ticket

Action
as new
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.