[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #9949(new data)

Opened 13 months ago

Last modified 7 weeks ago

Hebrew time in MeasureFormat style

Reported by: markus Owned by: anybody
Component: datetime Data Locale: he
Phase: dsub Review:
Weeks: 0.1 Data Xpath:
Xref:

Description

We got this report:

In Hebrew, ICU MeasureFormat formats "1 hour and 20 minutes" as "שעה ו20 דקות".

This is technically correct, but contemporary Hebrew uses dashes between the prefix and the number like so: "שעה ו-20 דקות".

GoogleIssue:21715280

Attachments

Change History

comment:1 Changed 10 months ago by pedberg

  • Cc pedberg added

comment:2 Changed 8 months ago by grhoten

Siri does this for lists of items "anded" together too. This ticket seems correct. Only native scripted items don't have the hyphen in Hebrew in this scenario.

If 20 were spelt out in Hebrew, then the hyphen would go away.

comment:3 Changed 7 weeks ago by mark

The issue appears to be in the ICU code. It appears that the code is not using the unit list patterns, which have no ו character for the listPatternPart type="2".

	<listPatterns>
		...
		<listPattern type="unit">
			<listPatternPart type="start">{0}, {1}</listPatternPart>
			<listPatternPart type="middle">{0}, {1}</listPatternPart>
			<listPatternPart type="end">{0} ו{1}</listPatternPart>
			<listPatternPart type="2">{0}, {1}</listPatternPart>
		</listPattern>
		<listPattern type="unit-narrow">
			<listPatternPart type="start">{0} {1}</listPatternPart>
			<listPatternPart type="middle">{0} {1}</listPatternPart>
			<listPatternPart type="end">{0} {1}</listPatternPart>
			<listPatternPart type="2">{0} {1}</listPatternPart>
		</listPattern>
		<listPattern type="unit-short">
			<listPatternPart type="start">{0}, {1}</listPatternPart>
			<listPatternPart type="middle">{0}, {1}</listPatternPart>
			<listPatternPart type="end">{0}, {1}</listPatternPart>
			<listPatternPart type="2">{0}, {1}</listPatternPart>
		</listPattern>
	</listPatterns>
Last edited 7 weeks ago by mark (previous) (diff)

comment:4 Changed 7 weeks ago by mark

  • Cc shane added

Or maybe some other pattern is being used? Maybe:

		<listPattern>
			<listPatternPart type="start">{0}, {1}</listPatternPart>
			<listPatternPart type="middle">{0}, {1}</listPatternPart>
			<listPatternPart type="end">{0} ו{1}</listPatternPart>
			<listPatternPart type="2">{0} ו{1}</listPatternPart>
		</listPattern>

Adding Shane to this discussion.

comment:5 Changed 7 weeks ago by shane

Test code:

    System.out.format("ICU Version: %s%n", VersionInfo.ICU_VERSION.toString());
    MeasureFormat mf = MeasureFormat.getInstance(new Locale("he"), FormatWidth.WIDE);
    String result = mf.formatMeasures(
        new Measure(1, TimeUnit.HOUR),
        new Measure(20, TimeUnit.MINUTE));
    System.out.println(result);

The output varies by ICU/CLDR version.

ICU Version: 54.1.1.0
שעה ו20 דקות

ICU Version: 55.1.0.0
שעה ו20 דקות

ICU Version: 56.1.0.0
שעה ו20 דקות

ICU Version: 57.1.0.0
שעה ו20 דקות

ICU Version: 58.1.0.0
שעה, 20 דקות

ICU Version: 59.1.0.0
שעה, 20 דקות

ICU Version: 60.1.0.0
שעה, 20 דקות

Looking at the code through a debugger, the list data comes from listPatterns/unit/2 in the he.xml data bundle. This data changed between CLDR 29 and CLDR 30 (between ICU 57 and ICU 58).

CLDR 29:

<listPatterns>
...
  <listPattern type="unit">
    <listPatternPart type="start">{0}, {1}</listPatternPart>
    <listPatternPart type="middle">{0}, {1}</listPatternPart>
    <listPatternPart type="end">{0} ו{1}</listPatternPart>
    <listPatternPart type="2">{0} ו{1}</listPatternPart>
  </listPattern>
...
</listPatterns>

CLDR 30 (and 32):

<listPatterns>
...
  <listPattern type="unit">
    <listPatternPart type="start">{0}, {1}</listPatternPart>
    <listPatternPart type="middle">{0}, {1}</listPatternPart>
    <listPatternPart type="end">{0} ו{1}</listPatternPart>
    <listPatternPart type="2">{0}, {1}</listPatternPart>
  </listPattern>
...
</listPatterns>

This is what ICU MeasureFormat list formatter is doing. I'm not sure what the expected behavior should be. Thoughts?

comment:6 Changed 7 weeks ago by mark

Hmmm. Looks like the specific problem noted won't occur now.

It could come up if there were 3 units: 3 days, 2 hours, and 5 minutes. Probably a lot less frequent...

If we know in Hebrew that the first element of the last item is always a digit, then we could change the end pattern (for units!) to put a hyphen after the ו:

<listPatternPart type="end">{0} ו-{1}</listPatternPart>

comment:7 Changed 7 weeks ago by shane

Maybe do something like we do with currency spacing, where we check if a certain condition is met post-processing and optionally add a spacing character?

If this is only an isolated issue, we probably shouldn't over-engineer it.

View

Add a comment

Modify Ticket

Action
as new
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.