[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 

Ticket #1085 (closed defect: wontfix)

Opened 3 years ago | Find changesets..

Dependency on Unicode Standard version

Reported by: emuller(at)adobe.com Owned by: deborah
Component: unknown Version:

Description

Consider the short date format for the ar locale, "dd/MM/yy". After replacement
by an actual date, with a base level of 1, the display of the resulting string
depends on the version of Unicode that is used (specifically, the change occurs
between 4.0 and 4.0.1, when / changed from ES to CS). CLDR needs to record
somehow the version(s) of Unicode it is compatible with.

It is quite possible that a change to Unicode would affect only a small number
of locales (e.g. the impeding changes for Myanmar), so may be this needs to be
recorded per locale (although this can become problematic with inheritance).

Attachments

Change History

Changed 40 years ago by notes2

This will also need a spec change to indicated that LRM is to be ignored when
parsing (It should probably be broader, to include other Cf characters).

Changed 40 years ago by old_notes2

We agreed to add LRM where necessary in date and time formats for BIDI locales
to ensure that - and / were consistently ordered no matter what the
environment.

This will also need a spec change to indicated that LRM is to be ignored when
parsing (It should probably be broader, to include other Cf characters).

Changed 3 years ago by emuller(at)adobe.com

(Guest Reply)

See also 1084, 1085, 1086.

Changed 3 years ago by guest

sent reply 1

Changed 3 years ago by mark

changed notes2

Changed 3 years ago by mark

moved from incoming to spec

Changed 3 years ago by deborah

changed notes2

Changed 3 years ago by deborah

moved from spec to discuss

Changed 3 years ago by Mark Davis <mark.davis(at)jtcsv.com>

This is a general issues, since it comes up for sorting, etc also. Our discussion is not complete, but here are some notes from the meeting.

0. Part of the issue is that what the vetters approve depends on how well the browser follows the bidi algorithm -- and which browser is being used. One possibility is to modify the survey tool to insert LRM/RLMs to make the display be the same, but that will mask the fact that in usage it may vary.

1. One possibility is to tie the CLDR version to a Unicode version, eg the last version available. The advantage is that the definition is clear, and a specification of sorting, for example, doesn't need two version numbers (UCA + CLDR). Disadvantages are that people may want to use the latest CLDR with an earlier version of Unicode, or with systems that aren't fully Unicode version X compliant.

2. An alternative is to try to make the formats as independent of Unicode as possible. Eg for sorting root is assumed to be the lastest UCA -- we can have someplace the delta tailoring that modifies old UCAs to be the latest. For formats with bidi, we could add LRM/RLMs to make the actual stored formats be independent of Unicode version. It needs to be clear that parsing must ignore those characters.

Changed 3 years ago by mark

sent reply 2

Changed 3 years ago by mike.tardif(at)adobe.com

(Guest Reply)

Allow me to briefly frame the discussion around a specific
locale's date format to better illustrate the issue.

Consider the Saudi medium Gregorian date pattern

dd/MM/yyyy

When parsing Saudi dates, I have little issues in
understanding that the logical input two Arabic-Indic digits
(corresponding to the day of the month), followed by the
solidus character, followed by two Arabic-Indic digits
(corresponding to the month of the year), followed by the
solidus character, and followed by four Arabic-Indic digits
(corresponding to the year of the era) represents a valid
date value.

My concern arises when formatting Saudi date values: when
the CLDR states that the medium date pattern is

dd/MM/yyyy

it couldn't have been done devoid of some expectation of a
particular rendering: that is, the logical sequence of two
Arabic-Indic digits followed by the solidus character,
followed by two Arabic-Indic digits, followed by the solidus
character, followed by four Arabic-Indic digits, in a RTL
text direction context will yield a specific rendering.

In a Unicode 4.0.1-compliant environment, where the solidus
character has the ES bidi property, the rendered date value
will exhibit the year component on the right.

In a Unicode 4.0-compliant environment, where the solidus
character has the CS bidi property, the rendered date value
will exhibit the year component on the left.

One can witness the differing renderings by viewing the page

 http://www-950.ibm.com/software/globalization/icu/demo/locales/?d_=en&_=ar_SA

in Firefox and IE. That browsers have different levels of
Unicode compliance isn't the issue.

Rather, the translators and vetters of CLDR pattern data for
Hebrew and Arabic locales have (perhaps unconsciously)
subscribed to an expected rendering that was based upon
compliance to a specific Unicode standard.

IMHO, that's what I think needs to be re-dressed: spell out
the expected rendering (effectively by stating compliance to
a particular Unicode 4.x standard) and then, if possible,
specify patterns that will yield visually uniform renderings
in various Unicode 4.x standards.

Changed 3 years ago by guest

sent reply 3

Changed 3 years ago by matial(at)il.ibm.com

(Guest Reply)

The change in Unicode's classification of Solidus from ES to CS causes Arabic dates in Arabic-Indic digits to be displayed wrongly. Adding Unicode version numbers to CLDR version numbers may help understanding what went wrong, but is no cure for the problem.
In this case, adding LRM before (or after) each of the slashes in the template will cause the dates to be displayed correctly with any Unicode version known today.

We cannot guarantee that such a "universal" template can be created in every problematic case, so maybe the version numbers are useful, but for this example, the better template seems preferable.

Changed 3 years ago by guest

sent reply 4

Changed 3 years ago by mark

changed notes2

Changed 3 years ago by mark

changed notes2

Changed 3 years ago by mark

moved from discuss to data

Changed 3 years ago by deborah

changed notes2

Changed 3 years ago by mike.tardif(at)adobe.com

(Guest Reply)

For CLDR 1.5, using the survey tool, I've just gone through every Arabic locales (whose native zero digit isn't ASCII 0) and effectively preceded any solidus character in a Gregorian date format with a RTM, as matial suggested above.

This has provided a uniform rendering behaviors across browsers.

I've noted a bug still present in the CheckDates? tool. It still has a problem parsing the RTM's in a pattern.

Changed 3 years ago by guest

sent reply 5

Changed 3 years ago by mike.tardif(at)adobe.com

(Guest Reply)

In previous reply, please read RTM to mean RLM.

Changed 3 years ago by guest

sent reply 6

Changed 2 years ago by mark

changed notes2

Changed 2 years ago by Mike Tardif <mike.tardif(at)adobe.com>

CLDR 1.5 Arabic date formats were fixed by myself using survey tool.

CLDR 1.5 Checkdates tool was fixed by Steven.

Current draft of UTS#35 does not yet have any mention of lenient parsing of RLM and LRM.

Changed 2 years ago by tardif

sent reply 7

Changed 2 years ago by tardif

changed notes2

Changed 2 years ago by tardif

moved from data to docs

Changed 2 years ago by tardif

changed notes2

Changed 2 years ago by tardif

moved from docs to fixed

Changed 2 years ago by emmons

moved from fixed to closed

Add/Change #1085 (Dependency on Unicode Standard version)

Author


E-mail address and user name can be saved in the Preferences.


Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.