[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #7214(closed enhancement: fixed)

Opened 4 years ago

Last modified 3 years ago

add more units for the next release

Reported by: mark Owned by: mark
Component: main Data Locale:
Phase: dsub Review: pedberg
Weeks: Data Xpath:
Xref:

Description

Draft list is on https://docs.google.com/spreadsheets/d/1Gun2paQQwLA59460Z9dvTYVKhpM-TBedVAL25TLorGs/edit#gid=0

In addition, add structure with the SI prefixes, and specify how to use those to dynamically form names. One question is whether they should be patterns, eg (yotta{0}) or just simple strings. The former would allow more flexibility, but I'm not sure whether it is needed.

After discussion, here are some changes to the above that haven't yet been added:

Drop: farad, coloumb, lumen, candela, mole, pascal, kiloliter, energy-kilojoules

Add:
centiliter, decimeter, lux

(only certain locales, Fredrik to determine): scandinavian-mile, kryddmått (need English name), deciliter-per-kilometer and liter-per-scandinavian-mile (fuel consumption), scandinavian-cubic-mile and scandinavian-mile and scandinavian-square-mile

en-modern-only: stone, bushel, cup, tablespoon, teaspoon

change the Metric? column to be a flag indicating that SI prefixes are applicable

Attachments

Change History

comment:1 Changed 4 years ago by kent.karlsson14@…

Drop: farad, coloumb, lumen, candela, mole, pascal, kiloliter, energy-kilojoules

I see no reason whatsoever to drop kilojoules. It is a commonly used unit.

But you really need to drop amount-carat (not mass-carat), foodcalorie (will confuse everyone!),
fathom, furlong, bushel (three no longer used units), stone (disappearing, fortunately), millimeter
of mercury (no longer used). I think megagram is unusual, but ok to keep.

You also really MUST replace gigabyte with gibibyte, kilobyte with kibibyte, megabyte with mebibyte,
and terabyte with tebibyte. There is absolutely no reason to support old bad habits, now that the
binary prefixes are standardised since long, and now gaining very common use.

Add: centiliter, decimeter, lux

Good.

(only certain locales, Fredrik to determine):
scandinavian-mile, kryddmått (need English name),
deciliter-per-kilometer and liter-per-scandinavian-mile (fuel consumption),
scandinavian-cubic-mile and scandinavian-mile and scandinavian-square-mile

If you are introducing a mechanism for limiting some units to certain locales, that is fine.
But then US/UK only units (i.e. the imperial units) MUST then also in CLDR be limited to US/UK.

kryddmått = spice-measure (literally)

en-modern-only: stone, bushel, cup, tablespoon, teaspoon

But tablespoon (15ml) and teaspoon (5ml) are commonly used also in Europe (in recipes),
maybe also in other parts of the world.

The types for the ”new” (to CLDR) units need to be fixed as well.

See also my more detailed comments in https://docs.google.com/spreadsheets/d/1vlJYYbJNNP7g_FBGUW1IFEqlrw71AeMdLIr0m5SV1no/edit?usp=sharing.

comment:2 Changed 4 years ago by emmons

  • Owner changed from anybody to mark
  • Priority changed from assess to critical
  • Status changed from new to assigned
  • Component changed from unknown to data
  • Milestone changed from UNSCH to 26dsub

comment:3 Changed 4 years ago by emmons

  • Type changed from unknown to enhancement

comment:4 Changed 3 years ago by mark

Just added first draft for the new units. Held back on some of them until we could clear them up.

TODO: some of the new units will be specific to just en, or just sv/nb/nn. That needs to be adjusted in Coverage.

comment:5 follow-up: ↓ 7 Changed 3 years ago by kent.karlsson14@…

Split:

”consumption”: split! inverses should not share a category; split into

"consumption" (volume per distance) and
"mileage" (for lack of better word; distance per volume)

”electric”: split! different ”physical quantities” should NOT share a category; split into

(electriccapacitance (*farad))
electriccurrent (*ampere)
electriccurrentcapacity (*ampere-<time>)
electricpotential (*volt)
electricresistance (*ohm)

”light”: split! different ”physical quantities” should NOT share a category; so far only

luminousemittance

”digital”: split, storage size and transfer speed should not share a category; see below under "Add".

Delete:

”amount-karat”, what amount? (there is a carat used for "purity" (usually for gold),

but that is like a percentage [but parts-per-24 rather than parts-per-hundred] not a unit); ”mass-carat” is ok.

”foodcalorie” does not make sense.
<unit type="pressure-millimeter-of-mercury">, old and now unused
The following have various errors (see replacement below):
<unit type="digital-gigabit">
<unit type="digital-gigabyte">
<unit type="digital-kilobit">
<unit type="digital-kilobyte">
<unit type="digital-megabit">
<unit type="digital-megabyte">
<unit type="digital-terabit">
<unit type="digital-terabyte">

Add:

<unit type="digitaldatarate-gigabit-per-second">
<unit type="digitaldataamount-gibibyte">
<unit type="digitaldatarate-kilobit-per-second">
<unit type="digitaldataamount-kibibyte">
<unit type="digitaldatarate-megabit-per-second">
<unit type="digitaldataamount-mebibyte">
<unit type="digitaldatarate-terabit-per-second">
<unit type="digitaldataamount-tebibyte">
ampere-hour
milliampere-hour
length-scand-mile
area-scand-square-mile
volume-scand-cubic-mile
volume-cup
volume-tablespoon
volume-teaspoon
volume-spicemeasure

Typos:

{0} liter per kilometers --> {0} liters per kilometer

Better "type" (category) names:

<unit type="mass-metric-ton"> -->> <unit type="mass-tonne">
<unit type="mass-ton"> -->> <unit type="mass-short-ton">
"acceleration-meter-per-second-squared" --> "acceleration-meter-per-second-per-second"

comment:6 Changed 3 years ago by kent.karlsson14@…

{0} liter per kilometers --> {0} liters per kilometer

Better to replace with deciliter(s) per kilometer; liter per kilometer is too large (assuming that this refers to cars rather than jet aircraft...); even dL/km is a bit large... (cL/km...)

comment:7 in reply to: ↑ 5 Changed 3 years ago by mark

Replying to kent.karlsson14@…:

Split:

”consumption”: split! inverses should not share a category; split into

"consumption" (volume per distance) and
"mileage" (for lack of better word; distance per volume)

”electric”: split! different ”physical quantities” should NOT share a category; split into

(electriccapacitance (*farad))
electriccurrent (*ampere)
electriccurrentcapacity (*ampere-<time>)
electricpotential (*volt)
electricresistance (*ohm)

”light”: split! different ”physical quantities” should NOT share a category; so far only

luminousemittance

”digital”: split, storage size and transfer speed should not share a category; see below under "Add".

The categories are not necessarily different physical quantities; they are simply grouping for organizational convenience. However, I will bring this up to the committee as to whether we should change this.

Delete:

”amount-karat”, what amount? (there is a carat used for "purity" (usually for gold),

but that is like a percentage [but parts-per-24 rather than parts-per-hundred] not a unit);

The karat is used. 7,810,000 instances on Google, compared to only 59,000 for kryddmått.

I agree that "amount" is ugly. Will bring that up to the committee as well. It is really a proportion.

”mass-carat” is ok.

”foodcalorie” does not make sense.

Foodcalorie does make sense, see Peter's email comments. It is just a different name for a quantity (kcal). But you yourself are proposing that, with scandinavian mile (=10km) or spicemeasure (=1ml)

<unit type="pressure-millimeter-of-mercury">, old and now unused

Unused? Again, you appear to be only considering regional usage. 1,570,000 hits on Google.

The following have various errors (see replacement below):
<unit type="digital-gigabit">
<unit type="digital-gigabyte">
<unit type="digital-kilobit">
<unit type="digital-kilobyte">
<unit type="digital-megabit">
<unit type="digital-megabyte">
<unit type="digital-terabit">
<unit type="digital-terabyte">

Add:

<unit type="digitaldatarate-gigabit-per-second">
<unit type="digitaldataamount-gibibyte">
<unit type="digitaldatarate-kilobit-per-second">
<unit type="digitaldataamount-kibibyte">
<unit type="digitaldatarate-megabit-per-second">
<unit type="digitaldataamount-mebibyte">
<unit type="digitaldatarate-terabit-per-second">
<unit type="digitaldataamount-tebibyte">

As to the per second, we are using a different mechanism.

As to gibibyte, you are disregarding previous email on this topic. While a laudable attempt, it is simply not in widescale use.

"...According to a Google search, there are 305 times (30500%) more instances of gigabyte...."

Please do not repeat requests that have already clearly been denied, unless you have new compelling evidence to present.

ampere-hour
milliampere-hour
length-scand-mile
area-scand-square-mile
volume-scand-cubic-mile
volume-cup
volume-tablespoon
volume-teaspoon
volume-spicemeasure

We are adding (they are already in) the cup, table/teasppon, and some scandinavian miles, but in modern coverage only for some countries. See the latest checkin.

We cannot add all all units: unclear whether we need ampere-hour or milliampere-hour, or volume-scand-cubic-mile.

Typos:

{0} liter per kilometers --> {0} liters per kilometer

was fixed.

Better "type" (category) names:

<unit type="mass-metric-ton"> -->> <unit type="mass-tonne">
<unit type="mass-ton"> -->> <unit type="mass-short-ton">
"acceleration-meter-per-second-squared" --> "acceleration-meter-per-second-per-second"

"Better" is debatable, but I'll bring these to the committee as well.

comment:8 Changed 3 years ago by mark

From the committee discussion:

  1. Peter & Mark delegated to review and possibly change categories, and possibly change other names like metric-ton, ton, meter-per-second-squared.
  2. Change 'amount' to 'proportion'.

comment:9 Changed 3 years ago by kent.karlsson14@…

Replying to mark:

Replying to kent.karlsson14@…:

Split:

”consumption”: split! inverses should not share a category; split into

"consumption" (volume per distance) and
"mileage" (for lack of better word; distance per volume)

...

The categories are not necessarily different physical quantities; they are simply grouping for organizational convenience. However, I will bring this up to the committee as to whether we should change this.

For all units in CLDR 25 each category has been for one and only one physical quantity. It seems to be a bad idea to depart from that.

Delete:

”amount-karat”, what amount? (there is a carat used for "purity" (usually for gold),

but that is like a percentage [but parts-per-24 rather than parts-per-hundred] not a unit);

The karat is used. 7,810,000 instances on Google, compared to only 59,000 for kryddmått.

I never said it wasn't used. But it is not a unit in the same sense as the other units items in CLDR. For instance, 239 carat gold (referring to the purity, not the weight) does not make sense. And yes, the two different kinds of carat (weight and purity) are spelled the same (even though one *can* make an artificial distinction in English: c/k), only context decides which is meant. Another reason not to include this one in CLDR.

I agree that "amount" is ugly. Will bring that up to the committee as well. It is really a proportion.

”mass-carat” is ok.

”foodcalorie” does not make sense.

Foodcalorie does make sense, see Peter's email comments. It is just a different name for a quantity (kcal). But you yourself are proposing that, with scandinavian mile (=10km) or spicemeasure (=1ml)

(I haven't gotten a copy of that email...)

That is not the problem. The problem with this one is that it, beside from being a bad idea (and food products ARE marked with kJ as well as kcal (kilocalories), not "food calories"), it will confuse translators to no end. It will also confuse application programmers to no end.

Wikipedia: "Sometimes, in an attempt to avoid confusion, the large calorie is written as "Calorie" (with a capital "C"). This convention is not always followed, and not explained to the average person clearly." Indeed, and not at all helpful, especially not in a CLDR context. That people say "calorie" while referring to kilocalorie is an error, not something to be supported in any way.

<unit type="pressure-millimeter-of-mercury">, old and now unused

Unused? Again, you appear to be only considering regional usage. 1,570,000 hits on Google.

Claimed (Wikipedia) to now be mainly used in medicine (for blood pressure); NOT for other pressures (like in meteorology or engineering). The other pressure units in CLDR are clearly geared towards meteorology, and are listed under weather. ST/beta erroneously has mm Hg under "weather", not "medicine", which would be appropriate if you insist on including it (as you say CLDR can't include all units, but you seem determined to include every one of the imperial units). For medicine there are then several other units that should be included, such as ml/h, various concentration units (for medications, blood gases), and others (I don't have a list).

The following have various errors (see replacement below):
<unit type="digital-gigabit">
<unit type="digital-gigabyte">

...

Add:

<unit type="digitaldatarate-gigabit-per-second">
<unit type="digitaldataamount-gibibyte">

...

As to the per second, we are using a different mechanism.

Which? Though I used to support a compositional approach, I don't anymore, for the "full name" of units; though it would work well for short/narrow forms. Not a problem with "per", as the use of inflection instead of "per" does not generalise; but still. But large "bit"
amounts are normally only used in the context of data rates.

As to gibibyte, you are disregarding previous email on this topic. While a laudable attempt, it is simply not in widescale use.

Like it or not, inclusion in CLDR will constitute a recommendation to use. But in this case:

  • The units themselves are not ill-defined; e.g. 1 megabyte is *exactly* one million bytes (fine, assuming *for the moment* that byte is interpreted as exactly 8 bit, which is *not* historically true, though that is the prevalent modern use; I'm sure the French translation will say "octet").
  • However, "megabyte" has much to often been used for "two raised to twenty bytes" (etc. for other prefixes). They have thus *acquired* an ill-definition. And for that reason they MUST NOT be used, regardless of how many hits you get in search engines. And there are now established alternative units that do not have acquired ill-definitions.

"...According to a Google search, there are 305 times (30500%) more instances of gigabyte...."

Irrelevant, as that reflects unstable, and ill-defined, usage. It would be irresponsible to recommend their use (and yes, inclusion in CLDR still will be read as recommendation of use, in this case globally). Using arguments like yours, all and any improvement for the better
would be impossible.

Please do not repeat requests that have already clearly been denied, unless you have new
compelling evidence to present.

See above, as I apparently do need to repeat myself.

ampere-hour

...

volume-spicemeasure

We are adding (they are already in) the cup, table/teasppon, and some scandinavian miles, but in modern coverage only for some countries. See the latest checkin.

Ok (but were not there when I wrote the comment).

However, the "bushels", "fathoms" and other imperial units really do need to be limited to US/GB. Only inch has a (declining) global use; declining since sizes of (large) display units are now given also in cm, the use of inch for small displays has not been all that common. And yes, jeans sizes, but there it is an approximation, and indeed you will find size tables that says things like "size 31 [would be read as inches]....30 in..." etc. (see e.g. http://shop.benetton.com/se_en/outlet/man/denim.html, and press the "size chart" and see the Denim table). I.e., even though they might look like it, they are not really inch sizes.

Coverage:

  1. '(?!sv|nb|no)' (in coverage): I think you meant ('?!sv|nb|nn)'.
  1. 'key="%notEnglish" value="(?!en)"' This should refer to English (and Spanish) in US and GB only*, not English elsewhere, in particular not "global English". I.e. imperial units should be asked to be translated for en_US (NOT for 'en'), es_US, and en_GB (possibly cy, and other languages used (only) in GB).
  1. 'key="%englishUnit" value="(length-fathom|length-furlong|mass-stone|volume-bushel)': PLEASE DO add all other imperial units (except inch, for some more years...) to that list.

That is for ST only, you also need to "tell" application programmers that imperial units are inappropriate outside of US/GB.

We cannot add all all units: unclear whether we need ampere-hour or milliampere-hour, or volume-scand-cubic-mile.

Can't add all units to CLDR (esp. since the systems are logically productive), but (milli)ampere-hour is a common unit for battery capacity. All portable computers and all mobile phone have batteries... I would expect a (good) battery monitor to state
the battery capacity, wether nominal/original, or even better, estimated actual maximum capacity as batteries tend to degrade over time. Also "estimated remaining capacity" (remaining charge) in mAh (not just percentage of max) could/should be of interest to display.

See A·h or Ah (it is permissible to leave out the dot, even without space) in http://en.wikipedia.org/wiki/Ampere-hour and http://en.wikipedia.org/wiki/Lithium-ion_battery.

Can wait with scand-cubic-mile (but seems a bit unbalanced, since you have the corresponding, and not all that common, English cubic mile), but it IS used.

Better "type" (category) names:

<unit type="mass-metric-ton"> -->> <unit type="mass-tonne">
<unit type="mass-ton"> -->> <unit type="mass-short-ton">
"acceleration-meter-per-second-squared" --> "acceleration-meter-per-second-per-second"

"Better" is debatable, but I'll bring these to the committee as well.

Well, "metric ton" does not work well with prefixes ("megametric ton"? naa). While more directed at the actual (full) name than the type name, they usually coincide (after the category/type, and ignoring hyphens) for English. "Short ton" makes it clear that it is not the "tonne". That is important both for translators and "end users". Re. "meter-per-second-per-second", that is the logic of the unit, and "square seconds" (and it will be translated as if it said that) may be a bit too strange, though I would be ok with it. "second-two" (another suggested "translated as" alternative) is no better (and quite ugly).

comment:10 Changed 3 years ago by mark

  • Status changed from assigned to reviewing
  • Review set to pedberg

comment:11 Changed 3 years ago by kent.karlsson14@…

  1. "<displayName>degrees kelvin</displayName>

<unitPattern count="one">{0} degree kelvin</unitPattern>
<unitPattern count="other">{0} degrees kelvin</unitPattern>"

and
"<displayName>°K</displayName>

<unitPattern count="one">{0}°K</unitPattern>
<unitPattern count="other">{0}°K</unitPattern>"

Those are wrong, there are no "degrees kelvin", it is called "kelvin", and the symbol is K, NOT °K.

  1. In addition, for "weather", you have missed an important unit, mm/h, for precipitation.

comment:12 Changed 3 years ago by kent.karlsson14@…

I do object to removing Scandinavian mile.

All my other comments are still standing as well! Please read comment 9 above.

comment:13 Changed 3 years ago by pedberg

  • Status changed from reviewing to closed
  • Resolution set to fixed

comment:14 Changed 3 years ago by markus

  • Phase set to dsub
  • Milestone changed from 26dsub to 26
View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.