[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #11563(closed: fixed)

Opened 2 months ago

Last modified 5 weeks ago

Add more units for v35

Reported by: mark Owned by: shane
Component: units Data Locale:
Phase: dsub Review: mark
Weeks: Data Xpath:
Xref:

Description (last modified by mark) (diff)

As discussed, we would like to add the following units:

CLDR Identifier Full Name Description root en, type=short en, type=long, count=one en, type=long, count=other
mass-atomic-mass-unit atomic mass unit a unit of mass used to express atomic and molecular weights, equal to one-twelfth of the mass of an atom of carbon-12. {0} amu {0} u {0} atomic mass unit {0} atomic mass units
area-dunam Turkish dunam a measure of land area used in parts of the former Turkish empire, including Israel (where it is equal to about 900 square meters). {0} dunam {0} dunam {0} dunam {0} dunams
mass-earth-mass Earth mass Astronomy: the unit of mass equal to that of Earth {0} M⊕ {0} M⊕ {0} Earth mass {0} Earth masses
energy-electronvolt electronvolt, eV physics, the electronvolt (symbol eV, also written electron-volt and electron volt) {0} eV {0} eV {0} electronvolt {0} electronvolts
angle-astronomical-hour hour angle astronomy: the angle measured in hours of time, solar {0}h {0}h {0} hour {0} hours
angle-astronomical-minute minute angle "astronomy: the angle measured in minutes of time / Note: this is different from 'minute-of-arc' by a factor of 15, see https://en.wikipedia.org/wiki/Right_ascension#Symbols_and_abbreviations" {0}m {0}m {0} minute {0} minutes
force-newtons Newtons, N A newton (N) is the international unit of measure for force {0} N {0} N {0} Newton {0} Newtons
pressure-kilopascal kilopascal (kPa) Kilopascal, unit of pressure {0} kPa {0} kPa {0} kilopascal {0} kilopascals
pressure-megapascal megapascal (MPa) Megapascal, unit of pressure {0} MPa {0} MPa {0} megapascal {0} megapascals
angle-astronomical-second second of time angle "Astronomical, solar / Note: this is different from 'second-of-arc' by a factor of 15, see https://en.wikipedia.org/wiki/Right_ascension#Symbols_and_abbreviations" {0}s {0}s {0} second {0} seconds
light-solar-luminosity solar luminosity a unit of radiant flux used by astronomers to measure the luminosity of stars, galaxies in terms of the output of the Sun {0} L☉ {0} L☉ {0} solar luminosity {0} solar luminosities
mass-solar-mass solar mass unit of mass in astronomy, used to indicate the masses of other stars and galaxies in terms of the mass of the Sun {0} M☉ {0} M☉ {0} solar mass {0} solar masses
length-solar-radius solar radius this includes the astronomical symbol for the Sun: R☉ {0} R☉ {0} R☉ {0} solar radius {0} solar radiuses
torque-newton-meters torque, Newton·meters unit of torque (also called moment), in Newton*meters {0} N·m {0} N·m {0} Newton·meter {0} Newton·meters
torque-pound-foot torque, pound·foot unit of torque (also called moment), in pound*foot {0} lb-ft {0} lb-ft {0} pound·foot {0} pound·feet
energy-british-thermal-unit BTU, British thermal unit BTU, British thermal unit, is a traditional unit of heat {0} btu {0} BTU {0} BTU {0} BTU
concentr-mole mole mole, unit of measurement for amount of substance {0} mol {0} mol {0} mole {0} moles
volume-barrel-oil oil barrels Unit of volume for crude oil. One barrel equals 42 US gallons {0} barrels {0} barrels {0} barrel {0} barrels

Attachments

Change History

comment:1 Changed 2 months ago by mark

  • Description modified (diff)

comment:2 in reply to: ↑ description Changed 2 months ago by kent.karlsson14@…

Replying to mark:

mass-atomic-mass-unit atomic mass unit a unit of mass used to express atomic and molecular weights, equal to one-twelfth of the mass of an atom of carbon-12. {0} amu {0} u {0} atomic mass unit {0} atomic mass units

It's not that straight-forward. See en.wikipedia.org/wiki/Unified_atomic_mass_unit. It's "Da" (often with kilo prefix, kDa, common in chemistry) or "u"; pity there are two designations; "amu" was something similar, but not the same.

area-dunam Turkish dunam a measure of land area used in parts of the former Turkish empire, including Israel (where it is equal to about 900 square meters). {0} dunam {0} dunam {0} dunam {0} dunams

I think the current Egyptian translation for acre in CLDR is (erroneously) for dunam or a dunamish unit (namely feddan). The major problem with this "unit" is that it stands for different areas in different regions. See en.wikipedia.org/wiki/Dunam#Definition. Further, this is only of very local importance/use, and should NOT be offered up for translation around the world (momentarily ignoring the MAJOR problem given in the previous sentence).

energy-electronvolt electronvolt, eV physics, the electronvolt (symbol eV, also written electron-volt and electron volt) {0} eV {0} eV {0} electronvolt {0} electronvolts
force-newtons Newtons, N A newton (N) is the international unit of measure for force {0} N {0} N {0} Newton {0} Newtons
pressure-kilopascal kilopascal (kPa) Kilopascal, unit of pressure {0} kPa {0} kPa {0} kilopascal {0} kilopascals
pressure-megapascal megapascal (MPa) Megapascal, unit of pressure {0} MPa {0} MPa {0} megapascal {0} megapascals
concentr-mole mole mole, unit of measurement for amount of substance {0} mol {0} mol {0} mole {0} moles

Those seem nonproblematic.

angle-astronomical-hour hour angle astronomy: the angle measured in hours of time, solar {0}h {0}h {0} hour {0} hours
angle-astronomical-minute minute angle "astronomy: the angle measured in minutes of time / Note: this is different from 'minute-of-arc' by a factor of 15, see https://en.wikipedia.org/wiki/Right_ascension#Symbols_and_abbreviations" {0}m {0}m {0} minute {0} minutes
angle-astronomical-second second of time angle "Astronomical, solar / Note: this is different from 'second-of-arc' by a factor of 15, see https://en.wikipedia.org/wiki/Right_ascension#Symbols_and_abbreviations" {0}s {0}s {0} second {0} seconds

Those cannot be, not like that anyway. There is a wikipedia article on "hour angle" (en.wikipedia.org/wiki/Hour_angle), but no mention of a unit (apart from a hint using a SUPERSCRIPT h (ʰ), the superscripting is important). There is no wikipedia article on THIS KIND of minutes or seconds angle (apart from the common arc minute and arc second). The article you refer to hints at SUPERSCRIPTED h (ʰ), m (ᵐ), s (ˢ). And the long forms cannot be like that either. "Astronomical hour/minute/second angle" or similar. They do not seem to be very popular. (English) Wikipedia just barely mentions these (ok, they do occur in a few more articles); if they had been really popular units, we would surely see long explanatory articles in Wikipedia by now. So I suggest skipping these in CLDR.

mass-earth-mass Earth mass Astronomy: the unit of mass equal to that of Earth {0} M⊕ {0} M⊕ {0} Earth mass {0} Earth masses
light-solar-luminosity solar luminosity a unit of radiant flux used by astronomers to measure the luminosity of stars, galaxies in terms of the output of the Sun {0} L☉ {0} L☉ {0} solar luminosity {0} solar luminosities
mass-solar-mass solar mass unit of mass in astronomy, used to indicate the masses of other stars and galaxies in terms of the mass of the Sun {0} M☉ {0} M☉ {0} solar mass {0} solar masses
length-solar-radius solar radius this includes the astronomical symbol for the Sun: R☉ {0} R☉ {0} R☉ {0} solar radius {0} solar radiuses

1) The earth symbol (here circled plus, not ♁) as well as the sun symbol should be subscripted (I know, cannot do that in plain text, but a common workaround is to use underscore).
2) They are not really units (they are variables, whose values are only approximately known), but can be 'abused' as units.
3) You probably mean nominal solar radius (see en.wikipedia.org/wiki/Solar_radius#Nominal_solar_radius), with superscript N.
4) The solar luminosity is even less of a constant (the largest variation, for foreseeable time, is the 11-year sunspot cycle), so I guess you mean nominal solar luminosity, allowing it to be (ab)used as a unit.

torque-newton-meters torque, Newton·meters unit of torque (also called moment), in Newton*meters {0} N·m {0} N·m {0} Newton·meter {0} Newton·meters

Using · for this multiplication is correct. But it is commonly omitted: "Nm". In the long form, however, hyphen would be appropriate (multiplication dot looks odd in the long form).

torque-pound-foot torque, pound·foot unit of torque (also called moment), in pound*foot {0} lb-ft {0} lb-ft {0} pound·foot {0} pound·feet

Why hyphen for the short form? Would be appropriate for the long form though. Further, this is only of very local importance/use, and should NOT be offered up for translation around the world. And this should be "lbf⋅ft".

energy-british-thermal-unit BTU, British thermal unit BTU, British thermal unit, is a traditional unit of heat {0} btu {0} BTU {0} BTU {0} BTU

Apparently, this one is denoted "Btu". And, its definition varies (a bit): see https://en.wikipedia.org/wiki/British_thermal_unit#Definitions. Further, this is only of very local importance/use, and should NOT be offered up for translation around the world.

volume-barrel-oil oil barrels Unit of volume for crude oil. One barrel equals 42 US gallons {0} barrels {0} barrels {0} barrel {0} barrels

For oil, the unit would be "oil barrel" or (oddly) "blue barrel"; other "barrel" units refer to several different amounts. From en.wikipedia.org/wiki/Barrel_(unit): "bbl".

comment:3 Changed 2 months ago by mark

We have confirmation that we don't need the angle-astronomical-X units: people can just use the duration units.

comment:4 follow-up: ↓ 8 Changed 2 months ago by shane

So, to be clear, my understanding is that the following units are going in without controversy:

mass-atomic-mass-unit, except maybe change the symbol to "Da"
energy-electronvolt
force-newtons
pressure-kilopascal
pressure-megapascal
mass-earth-mass
light-solar-luminosity
mass-solar-mass
length-solar-radius, except maybe change the ID to length-nominal-solar-radius??
concentr-mole
torque-newton-meters
volume-barrel-oil, except maybe change the symbol to "bbl" (and the ID to volume-barrel??)

The following units were eliminated based on discussions with astronomers:

angle-astronomical-hour
angle-astronomical-minute
angle-astronomical-second

And the following units are ones that kent.karlsson14 suggests should be removed:

area-dunam
torque-pound-foot
energy-british-thermal-unit

comment:5 follow-up: ↓ 7 Changed 2 months ago by mark

As far as Ken's concerns: these should be added to the coverage level for English (as is already done with "foot" etc). So the list should be:

mass-atomic-mass-unit, except maybe change the symbol to "Da" need to consider this
energy-electronvolt
force-newtons
pressure-kilopascal
pressure-megapascal
mass-earth-mass
light-solar-luminosity
mass-solar-mass
length-solar-radius, except maybe change the ID to length-nominal-solar-radius??
concentr-mole
torque-newton-meters
volume-barrel-oil, except maybe change the symbol to "bbl" (and the ID to volume-barrel??)
The following units were eliminated based on discussions with astronomers:
angle-astronomical-hour
angle-astronomical-minute
angle-astronomical-second

Note for units that vary by country. Our main goal is the names of units. I think we probably want to have the general term defined first (volume-barrel, bbl). If it gets to the point where we need to have more specific versions we can add distinct additional types (barrel-oil).

Also, the changes to the spec need to be recorded in http://unicode.org/repos/cldr/trunk/specs/ldml/tr35.html#Modifications See the previous modifications for examples of how we do this.

comment:6 Changed 2 months ago by mark

  • Cc pedberg, kristi added

comment:7 in reply to: ↑ 5 Changed 2 months ago by kent.karlsson14@…

Replying to mark:

Note for units that vary by country. Our main goal is the names of units.

It is unfortunate that, to this day, some units (by a particular name) stands for different 'capacities' in different places. Just picking up the name (disregarding the difference in definition) is pointless. Instead such "units" should just not be used, esp. not occur i18n data.

I think we probably want to have the general term defined first (volume-barrel, bbl). If it gets to the point where we need to have more specific versions we can add distinct additional types (barrel-oil).

IIUC, bbl is only used for oil barrels (nominally of a standard size). Other barrels come in so many different sizes, and aren't really used as units (outside of very limited applications), as far as I can tell.

(Nit: https://en.wikipedia.org/wiki/Barrel_(unit) says "A volume of 1 bbl is exactly equivalent to a volume of 158.987294928 litres.". I bet no-one comes even close to that precision in practice, even under the best laboratory conditions... With sticky crude oil to boot.)

comment:8 in reply to: ↑ 4 Changed 2 months ago by kent.karlsson14@…

Replying to shane:

mass-atomic-mass-unit, except maybe change the symbol to "Da"

Following CLDR (apparent) policy, that would be two units (even though Da and u are the same).

mass-earth-mass
light-solar-luminosity
mass-solar-mass
length-solar-radius, except maybe change the ID to length-nominal-solar-radius??

All of these should be interpreted as nominal, as they really either vary or are not accurately known. Don't care much about the CLDR IDs, at this point. The IDs are a mess anyway...

And the following units are ones that kent.karlsson14 suggests should be removed:

area-dunam

Yes, remove.

torque-pound-foot
energy-british-thermal-unit

I don't at all mind removing them. But I said that they are only of regional interest, not global interest.

comment:9 Changed 2 months ago by shane

I added five units to start.

I added mass-dalton instead of mass-atomic-mass-unit.

The unit solar-luminosity is a measurement of *radiant flux*, which is dimensionally equivalent to power (watt), whereas the other unit with type light, lux, is a measurement of *luminous flux per unit area*, dimensionally equivalent to intensity, or power per unit area (watt per square meter). In other words, the two units are measuring different quantities. See:

https://en.wikipedia.org/wiki/Solar_luminosity
https://en.wikipedia.org/wiki/Lux

For now, I added solar-luminosity in type "light", but this may not be correct. It could be argued that a more correct category for solar-luminosity would be "power", or even an entirely new category.

I did not subscript the earth or sun signs in those units, because there is no good way to subscript them in Unicode that I am aware of. (Mark: could this be an excuse to finally add superscript/subscript modifier code points? That would be super useful both here and in many other situations I've encountered.)

The table in the OP said to use "solar radiuses" for the English plural form. I changed that to "solar radii", which has many more hits on Google than the other form.

I will follow up with the rest once Mark has a chance to verify that I did these first five correctly.

comment:10 follow-up: ↓ 11 Changed 2 months ago by mark

We might want to have an alt variant name for dalton, of amu....

comment:11 in reply to: ↑ 10 Changed 2 months ago by kent.karlsson14@…

Replying to mark:

We might want to have an alt variant name for dalton, of amu....

In other cases of "equal" units (like cubic centimeter and milliliter), you have have simply had two separate entries, no "alt".

comment:12 Changed 2 months ago by mark

About the equal units: good point, Kent, I think that bears discussion in the committee.

Shane, about the non-SI units (btu, dunam,...) what we do is put this into %unitsEnglish in the coverage, which makes them translated only at a modern level.

comment:13 Changed 2 months ago by mark

  • Priority changed from assess to major
  • Status changed from new to accepted
  • Milestone changed from to-assess to 35

comment:14 Changed 2 months ago by shane

I'm almost ready with the next changeset to add the remaining units for translation.

Open Questions:

  1. Any thoughts on what I said above (comment:9) about what type to use for solar-luminosity? I *don't* think we should re-use the type "light" since the two units cannot be converted between each other, and it seems like that is what "type" implies.
  2. Should I add mass-atomic-mass-unit now, or stick with mass-dalton?
  3. Btu or BTU? I used Btu.
  4. What are the requirements for being in %unitsEnglish versus %unitsCommonUS versus %unitsCommonMetric? None of the units being added are particularly "common". I added Btu to %unitsEnglish as discussed above.
  5. area-dunam is not an English unit; it seems to be most common in the Middle East and especially Turkey; where should it go in coverageLevels.xml?
  6. Does volume-barrel need to be in coverageLevels.xml? It has the problem similar to volume-cup where a barrel has different definitions depending on the region, but that unit does not appear in coverageLevels.xml as far as I can tell.
  7. area-dunam does not appear to have a symbol as far as I can tell; is it okay to put dunam as the symbol directly in root.xml?

Additional notes:

  • I added force-pound-force in addition to force-newton in order to be consistent with the units torque-pound-foot and torque-newton-meter
  • Some of the unit IDs in the OP were plural; I made them all singular, consistent with the pre-existing units
  • I plan to follow up with one additional changeset to alphabetize the entries in validity/unit.xml and DtdData.java

comment:15 Changed 8 weeks ago by shane

  • Status changed from accepted to reviewing
  • Review set to mark

My changes are committed. However, please look through my questions; there may be follow-up changes required.

Additional question:

  1. What is the standard for display name of the short form? The long form display name tends to be the suffix of the "other" category, and the narrow form tends to be the symbol. However, the display name of the short form is sometimes the singular full name (closer to the long form) and sometimes the symbol (closer to the narrow form).
Last edited 8 weeks ago by shane (previous) (diff)

comment:16 Changed 8 weeks ago by shane

Seems like there are some test failures. It's not immediately clear to me how to go about fixing them. I think I edited every file that needed to be edited. Can you take a look? The only thing I can think of would be that the middle-dot "⋅" might need to be added to SimpleXMLSource.java similar to how you added the earth and sun signs.

comment:17 Changed 8 weeks ago by mark

The errors are:

  1. Error: (TestAttributeValues.java:71) : Invalid units in English (may be problem with English or Validity): expected java.util.Collections$EmptySet<[]>, got java.util.TreeSet<[torque-newton-meter]>

The Validity file needs to be modified so that it reflects the new units. That is a new test to prevent them getting out of sync.

  1. Error: (TestAll.java:167) java.lang.IllegalArgumentException: Missing Map Comparator value(s): torque-newton-meter(null), digital-byte(38), java.lang.IllegalArgumentException: Missing Map Comparator value(s): torque-newton-meter(null), digital-byte(38),

There is a comparator for all the units that needs to be updated. (That is used to sort the units on the page for users.) It is in DtdData$DtdComparator.

  1. TestCompoundUnit {

Error: (TestExampleGenerator.java:304) : CompoundUnit: expected "〖❬1.00 meter❭ per ❬second❭〗", got "〖❬1 meter❭ per ❬second❭〗"

For some reason, the example generator used to have min=2 decimal digits and and now generates as if min=0 decimal digits. Not sure why that is happening.

comment:18 Changed 8 weeks ago by mark

  • Status changed from reviewing to reviewfeedback

Thanks for getting these in. I'll try to finish this off before your morning.

comment:19 Changed 8 weeks ago by mark

PS
Validity is in trunk/common/validity/unit.xml
Comparator is in trunk/tools/java/org/unicode/cldr/util/DtdData.java

cf. https://unicode.org/cldr/trac/changeset/14677

We really should have a page to describe the changes!

comment:20 Changed 8 weeks ago by mark

Turns out there were just a couple of typos in unit.xml and DtdData.java

comment:21 follow-up: ↓ 29 Changed 8 weeks ago by shane

LGTM on the fixes; thanks!

Thoughts on my questions 1-8? I will repeat them here:

  1. Any thoughts on what I said above (comment:9) about what type to use for solar-luminosity? I *don't* think we should re-use the type "light" since the two units cannot be converted between each other, and it seems like that is what "type" implies.
  2. Should I add mass-atomic-mass-unit now, or stick with mass-dalton?
  3. Btu or BTU? I used Btu.
  4. What are the requirements for being in %unitsEnglish versus %unitsCommonUS versus %unitsCommonMetric? None of the units being added are particularly "common". I added Btu to %unitsEnglish as discussed above.
  5. area-dunam is not an English unit; it seems to be most common in the Middle East and especially Turkey; where should it go in coverageLevels.xml?
  6. Does volume-barrel need to be in coverageLevels.xml? It has the problem similar to volume-cup where a barrel has different definitions depending on the region, but that unit does not appear in coverageLevels.xml as far as I can tell.
  7. area-dunam does not appear to have a symbol as far as I can tell; is it okay to put dunam as the symbol directly in root.xml?
  8. What is the standard for display name of the short form? The long form display name tends to be the suffix of the "other" category, and the narrow form tends to be the symbol. However, the display name of the short form is sometimes the singular full name (closer to the long form) and sometimes the symbol (closer to the narrow form).

comment:22 Changed 8 weeks ago by shane

  • Status changed from reviewfeedback to reviewing

comment:23 Changed 8 weeks ago by mark

As far as your questions go, I think all you've done looks reasonable.

  1. type is a general domain, don't think we necessarily expect convertibility.
  2. we should probably have a new variable for dunam to have it be moderate for certain countries and modern for the rest.
  3. the top coverage for all units should be modern, so that they don't need to be listed specifically.
  4. dunam is fine in root (best we can do)
  5. it has been the symbol where it exists, but otherwise as short a full name as possible.

comment:24 Changed 8 weeks ago by mark

  • Status changed from reviewing to reviewfeedback

Shane, I looked over your changes and they look good (although we should probably make dunam be moderate for certain locales, that won't matter for this limited submission release.

Can you review my changes, then we'll close?

comment:25 Changed 8 weeks ago by shane

Your commits on this ticket LGTM.

Thanks to your answers to questions 1 and 5-8. It looks like items 1 and 6-8 require no action. Can you comment on questions 2-4, and clarify the next steps for item 5?

comment:26 Changed 8 weeks ago by shane

  • Status changed from reviewfeedback to reviewing

comment:27 Changed 8 weeks ago by mark

#2 no action necessary for this release; not worth a new variable for a single item, esp since we don't have any target=Moderate this release.
#3 no action necessary - eg, dunam already in modern, http://st.unicode.org/cldr-apps/v#/de/Area/
#4 no action necessary; dunam is fine.

#5 Only action needed is if there is a reasonable abbreviation for some of the new units. For example, "dunam" is probably the shortest we can have and be understandable. So if you have any suggestions...

comment:28 Changed 8 weeks ago by shane

  • Status changed from reviewing to closed
  • Resolution set to fixed

Okay. SGTM. Closing the ticket.

comment:29 in reply to: ↑ 21 Changed 6 weeks ago by kent.karlsson14@…

Replying to shane:

  1. Any thoughts on what I said above (comment:9) about what type to use for solar-luminosity? I *don't* think we should re-use the type "light" since the two units cannot be converted between each other, and it seems like that is what "type" implies.

Right. It should be under "energy and power". It is about 383 YW (yottawatts). Not yet too late to fix.

comment:30 Changed 5 weeks ago by shane

Right. It should be under "energy and power". It is about 383 YW (yottawatts). Not yet too late to fix.

Okay.

This partially contradicts what Mark suggested to do:

type is a general domain, don't think we necessarily expect convertibility.

I'm opening this as a new ticket.

https://unicode.org/cldr/trac/ticket/11703

View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.