[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #9949(accepted data)

Opened 17 months ago

Last modified 13 days ago

Hebrew time in MeasureFormat style

Reported by: markus Owned by: pedberg
Component: datetime Data Locale: he
Phase: dsub Review:
Weeks: 0.1 Data Xpath:
Xref:

Description

We got this report:

In Hebrew, ICU MeasureFormat formats "1 hour and 20 minutes" as "שעה ו20 דקות".

This is technically correct, but contemporary Hebrew uses dashes between the prefix and the number like so: "שעה ו-20 דקות".

GoogleIssue:21715280

Attachments

Change History

comment:1 Changed 14 months ago by pedberg

  • Cc pedberg added

comment:2 Changed 11 months ago by grhoten

Siri does this for lists of items "anded" together too. This ticket seems correct. Only native scripted items don't have the hyphen in Hebrew in this scenario.

If 20 were spelt out in Hebrew, then the hyphen would go away.

comment:3 Changed 5 months ago by mark

The issue appears to be in the ICU code. It appears that the code is not using the unit list patterns, which have no ו character for the listPatternPart type="2".

	<listPatterns>
		...
		<listPattern type="unit">
			<listPatternPart type="start">{0}, {1}</listPatternPart>
			<listPatternPart type="middle">{0}, {1}</listPatternPart>
			<listPatternPart type="end">{0} ו{1}</listPatternPart>
			<listPatternPart type="2">{0}, {1}</listPatternPart>
		</listPattern>
		<listPattern type="unit-narrow">
			<listPatternPart type="start">{0} {1}</listPatternPart>
			<listPatternPart type="middle">{0} {1}</listPatternPart>
			<listPatternPart type="end">{0} {1}</listPatternPart>
			<listPatternPart type="2">{0} {1}</listPatternPart>
		</listPattern>
		<listPattern type="unit-short">
			<listPatternPart type="start">{0}, {1}</listPatternPart>
			<listPatternPart type="middle">{0}, {1}</listPatternPart>
			<listPatternPart type="end">{0}, {1}</listPatternPart>
			<listPatternPart type="2">{0}, {1}</listPatternPart>
		</listPattern>
	</listPatterns>
Last edited 5 months ago by mark (previous) (diff)

comment:4 Changed 5 months ago by mark

  • Cc shane added

Or maybe some other pattern is being used? Maybe:

		<listPattern>
			<listPatternPart type="start">{0}, {1}</listPatternPart>
			<listPatternPart type="middle">{0}, {1}</listPatternPart>
			<listPatternPart type="end">{0} ו{1}</listPatternPart>
			<listPatternPart type="2">{0} ו{1}</listPatternPart>
		</listPattern>

Adding Shane to this discussion.

comment:5 Changed 5 months ago by shane

Test code:

    System.out.format("ICU Version: %s%n", VersionInfo.ICU_VERSION.toString());
    MeasureFormat mf = MeasureFormat.getInstance(new Locale("he"), FormatWidth.WIDE);
    String result = mf.formatMeasures(
        new Measure(1, TimeUnit.HOUR),
        new Measure(20, TimeUnit.MINUTE));
    System.out.println(result);

The output varies by ICU/CLDR version.

ICU Version: 54.1.1.0
שעה ו20 דקות

ICU Version: 55.1.0.0
שעה ו20 דקות

ICU Version: 56.1.0.0
שעה ו20 דקות

ICU Version: 57.1.0.0
שעה ו20 דקות

ICU Version: 58.1.0.0
שעה, 20 דקות

ICU Version: 59.1.0.0
שעה, 20 דקות

ICU Version: 60.1.0.0
שעה, 20 דקות

Looking at the code through a debugger, the list data comes from listPatterns/unit/2 in the he.xml data bundle. This data changed between CLDR 29 and CLDR 30 (between ICU 57 and ICU 58).

CLDR 29:

<listPatterns>
...
  <listPattern type="unit">
    <listPatternPart type="start">{0}, {1}</listPatternPart>
    <listPatternPart type="middle">{0}, {1}</listPatternPart>
    <listPatternPart type="end">{0} ו{1}</listPatternPart>
    <listPatternPart type="2">{0} ו{1}</listPatternPart>
  </listPattern>
...
</listPatterns>

CLDR 30 (and 32):

<listPatterns>
...
  <listPattern type="unit">
    <listPatternPart type="start">{0}, {1}</listPatternPart>
    <listPatternPart type="middle">{0}, {1}</listPatternPart>
    <listPatternPart type="end">{0} ו{1}</listPatternPart>
    <listPatternPart type="2">{0}, {1}</listPatternPart>
  </listPattern>
...
</listPatterns>

This is what ICU MeasureFormat list formatter is doing. I'm not sure what the expected behavior should be. Thoughts?

comment:6 Changed 5 months ago by mark

Hmmm. Looks like the specific problem noted won't occur now.

It could come up if there were 3 units: 3 days, 2 hours, and 5 minutes. Probably a lot less frequent...

If we know in Hebrew that the first element of the last item is always a digit, then we could change the end pattern (for units!) to put a hyphen after the ו:

<listPatternPart type="end">{0} ו-{1}</listPatternPart>

comment:7 Changed 5 months ago by shane

Maybe do something like we do with currency spacing, where we check if a certain condition is met post-processing and optionally add a spacing character?

If this is only an isolated issue, we probably shouldn't over-engineer it.

comment:8 Changed 2 months ago by kirill

  • Priority changed from assess to medium

comment:9 Changed 2 weeks ago by kristi

  • Owner changed from anybody to discuss
  • Status changed from new to design

comment:10 Changed 13 days ago by pedberg

  • Owner changed from discuss to pedberg
  • Status changed from design to accepted
  • Milestone changed from UNSCH to 34

This is a particular example of a case in which a conditional pattern would be useful.

In Hebrew, such a pattern might insert a hyphen only if the following character is numeric or Latin.

In some Latin-script languages, such a pattern might modify a preposition to be "de" or "d’" depending on whether the following letter is a vowel, or might modify an article to use "l‘" depending on the following letter.

I think there is a separate ticekt for that. Anyway, I will look into this.

View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.