CLDR - Process

L2/06-186R

Date: 2006-05-09 (revised 2021-02-11)

Source: Mark Davis

This document presents a proposal for addition of an explicition resolution procedure to the CLDR Technical Committee page on the web site. It was originally presented to the UTC in 2006 and filed in the document register as a complete web page. To avoid confusion with the actual current CLDR Technical Committee pages, the revision of 2021-02-11 has removed the web page framework, while leaving intact the proposed content change for the page which was discussed in 2006. -- Ken Whistler

CLDR Technical Committee

1. Introduction

This document describes the CLDR Technical Committee, and its process for Data Collection, Vetting, Public Feedback and Release. These procedures and process were based on the Unicode Technical Committee Procedures, but are modified to provide for a much lighter-weight process for data collection and release, with a much faster turn-around. In particular, the meetings are much more often, much shorter, and more informal. Most of the work is by email or phone, with a database recording requested changes in data.

When gathering data for a region and language, it is important to have multiple sources for that data to produce the most widely acceptable data. Initial versions of data will be be based on the best available sources, but the expectation is that CLDR data will be modified and improved, in successive versions, by more input from the contributors inside and outside of the Unicode Consortium.

It is important to note that CLDR is a Repository, not a Registration. That is, contributors should not expect that their contributions will simply be adopted into the repository; instead, it will be vetted against the best available information.

All inputs will be open, and recorded in a bug/feature request database. Changes in response to items the database may be entered into the repository snapshot over time by the maintainers of the repository, but the final approval of the release of any version of CLDR is up to the decision of the CLDR-TC. For more information, see the CLDR-TC Procedures.

2. Specification Changes

The LDML specification may be changed to add structure for new kinds of data or other features. Requests for changes are also entered in the bug/feature request database.

Structural changes are always backwards-compatible. That is, previous files will continue to work. Deprecated elements remain and can be used, although their usage is strongly discouraged.

There is a standing policy for structural changes that require non-trivial code for proper implementation, such as timezone fallback or alias mechanisms. These require the existence of at least a prototype implementation that demonstrates correct function according to the proposed specification.

3. Data Vetting Process

Once data for a country and language has been received, the data from the different sources will be compared to show agreements and differences. Initial data contributions are normally marked as "draft"; this will be changed once the data is vetted.

Note that there are two types of data in the repository:

a) Comparison data: The contributor can be an individual or an organization. Data from a software platform or product, is normally gathered by calling public APIs, to ensure that the data matches what is actually in use. In either case, the data is only for comparison, and will not be changed except where necessary to update the data to match the external source. The only requirement is that all changed data be versioned, and the Version Numbering Scheme be used.

b) Common Data: The contents is decided upon by the CLDR-TC, following its procedures and this process.

Data contributed to CLDR from different sources may be in conflict. For example, a contribution on abbreviated month names may show each abbreviated name ending with a period and another contribution for the same abbreviated month names may not show the trailing period.

Members are encouraged to use local language and country contacts, inside and outside their organization, to help vet current common data and any new proposals for addition or amendment of common data. In particular, national standards organizations are encouraged to be involved in the data vetting process.

All people involved in vetting data must be familiar with the Data Formats and the requirements for Filing Bug Reports. In particular, they should compare any proposed changes against the data in the Comparison Charts and indicate which platforms the proposed changes align with, or whether they are different than all of the platforms.

3.1 Resolution Procedure

The following procedure will be used when resolving differences in submitted data.

There are multiple levels of access and control

TC - (Technical Committee members)

Can vet and submit data for all locales

Can manage users in their organization

Can see the email addresses for all vetters

Vetter (some distinguished as experts)

Can vet and submit data for a particular set of locales (controlled by their TCs

Cannot manage other users.

Can see the email addresses for submitted data in their locales.

Guest

Can vet and submit data for a particular set of locales (controlled by their TCs

Cannot manage other users.

Cannot see email addresses.

A locale group is the set of all locales that share the same first field. For example, en plus anything that starts with "en_". The group of vetters and TCs who are authorized for a locale group is called the interest group for the locale.

Fields with multiple proposals where vetters and TCs have not all voted for a single proposal are called conflicted. Correspondingly, a locale with any conflicted fields is a conflicted locale; a locale group with any conflicted locales is a conflicted locale group.

The draft attribute in the data can have 3 states:

approved : approved by the technical committee (equals the CLDR 1.3 value of false, or an absent draft attribute). This does not mean that the data is guaranteed to be error-free -- this is the best judgment of the committee.

provisional: entered and confirmed by TC member or expert vetter. Implementations may choose to accept the provisional data, especially if there is no translated alternative.

unconfirmed: no confirmation available: entered by guest without TC confirmation; or downgraded from provisional because of disagreement) (equals CLDR 1.3 of false)

In the vetting phase:

Anyone with an account (including submitters) can see a summary of all the conflicted fields for a given locale group.

Vetters can change their preferred value (a vote).

Anyone on the interest group, plus submitters, is notified (batched daily) if a vote is changed.

For each conflicted locale group, periodically (weekly at first, then daily) an email is sent to the interest group summarizing all the conflicted fields. The email will encourage the interest group to resolve any conflicts. The interest group will also have access to all the submitters' emails for the locale group, to ask questions and resolve issues.

Normally the "change to" field is frozen during the vetting process. However, in case of a conflicted field this can be appealed to the TC.

At the end of the vetting process:

All non-conflicted fields are normally accepted (the TC can override), subject to a quorum and the conflicting fields removed.

The quorum is at least one expert vetter, or at least 2 vetters from different organizations.

Conflicted fields can be decided by the TC, or remain conflicted (and remain provisional or unconfirmed).

This process can be fine-tuned by the technical committee as needed to resolve any problems that turn up.

3.2 Prioritization

There may be conflicting common practices or standards for a given country and language. Thus LDML provides keyword variants to reflect the different practices. For example, for German it allows the distinction between PHONEBOOK and DICTIONARY collation.

When there is an existing national standard for a country that is widely accepted in practice, the goal is to follow that standard as much as possible. Where the common practice in the country deviates from the national standard, or if there are multiple conflicting common practices, or options in conforming to the national standard, or conflicting national standards, multiple variants may be entered into the CLDR, distinguished by keyword variants or variant locale identifiers.

Where a data item is identified as following a particular national standard (or other reference), the goal is to keep that data aligned with that standard. There is, however, no guarantee that data will be tagged with any or all of the national standards that it follows.

4. Public Feedback Process

The public can supply formal feedback into CLDR by filing a Bug Report or Feature Request (RFE). Anyone can also asked to be added to a list that will receive notification of new CLDR bugs, so they can track issues if they want. Anyone will also be able to reply to any bug/rfe, to add comments (this latter feature is not yet functional, but will be soon).

To subscribe, send a note to "ecartis+unicode.org" (use an at sign instead of the +) and put "subscribe cldr-bugrfe" in the subject line.

To unsubscribe, put "unsubscribe cldr-bugrfe" in the subject line instead.

The email archive (in raw form) are in the CLDR Email Archives.

Public Review Issues may be posted in cases where broader public feedback is desired on a particular issue.

The public is also free to discuss locale issues on the Unicode list unicode@unicode.org, which is devoted to Unicode and related globalization standards (see Mail Lists to subscribe).

People should, however, be aware that changes and updates to CLDR will only be taken in response to complete, specific bug reports as described on Filing Bug Reports. Discussion on public mailing lists is not monitored; no actions will be taken in response to such discussion -- only in response to filed bugs. The process of checking and entering data takes time and effort; so even when bugs/RFEs are accepted, it may take some time before they are in a release of CLDR.

5. Data Release Process

5.1 Version Numbering

The locale data is frozen per version. Once a version is released, it is never modified. Any changes, however minor, will mean a newer version of the locale data being released. The versioning scheme is x.y.z, where z is incremented for bug fixes, y is incremented for any additions (such as new locale data or LDML elements), and x is incremented for any major changes in format.

5.2 Release Schedule

Early releases of a version of the common locale data will be issued as either alpha or beta releases, available for public feedback. The dates for the next scheduled release will be on CLDR Project.

The schedule milestones are:

Design All the proposed design changes have been accepted in place for changes in structure, and tools.

Structure, Tools, Documentation All the DTD and specification changes are made according to proposed design. The tools are updated to support the new structure, including the survey tool for displaying, collecting, and vetting data.

Data Incorporation & Vetting Using the survey tool, additional data is collected.

Beta Complete version for public review; may not yet incorporate all bug fixes

Release Released, stable, referenceable version.

6. Meetings and Communication

The currently-scheduled meetings are listed on the Unicode Calendar. Meetings are held by phone, every week at 8:00 Pacific Time (-08:00 GMT in winter, -07:00 GMT in summer). Some meetings may be skipped if they conflict with holidays or other Unicode meetings.

There is an internal email list for the Unicode CLDR Technical Committee (CLDR-TC), open to Unicode members and invited experts. All national standards bodies who are interested in locale data are also invited to become involved by establishing a Liaison membership in the Unicode Consortium, to gain access to this list.

Notification of the telephone numbers and passcode, and agenda, and any change in schedule are sent out on the this email list.

7. Officers

The current Technical Committee Officers are:

Chair: Mark Davis (IBM)

Vice-Chair: Deborah Goldsmith (Apple)