CLDR Technical Committee
1. Introduction
This document describes the CLDR Technical Committee, and its process for Data
Collection, Vetting, Public Feedback and Release. These procedures and process were based
on the Unicode Technical Committee Procedures, but are modified to provide for a much
lighter-weight process for data collection and release, with a much faster turn-around. In
particular, the meetings are much more often, much shorter, and more informal. Most of the
work is by email or phone, with a database recording requested changes in data.
When gathering data for a region and language, it is important to have multiple sources
for that data to produce the most widely acceptable data. Initial versions of data will be
be based on the best available sources, but the expectation is that CLDR data will be
modified and improved, in successive versions, by more input from the contributors inside
and outside of the Unicode Consortium.
It is important to note that CLDR is a Repository, not a Registration. That is,
contributors should not expect that their contributions will simply be adopted into the
repository; instead, it will be vetted against the best available information.
All inputs will be open, and recorded in a bug/feature request database. Changes in
response to items the database may be entered into the repository snapshot over time by
the maintainers of the repository, but the final approval of the release of any version of
CLDR is up to the decision of the CLDR-TC. For more information, see the
CLDR-TC Procedures.
2. Specification Changes
The LDML specification may be changed to add structure for new kinds of data or other
features. Requests for changes are also entered in the bug/feature request database.
Structural changes are always backwards-compatible. That is, previous files will
continue to work. Deprecated elements remain and can be used, although their usage is
strongly discouraged.
There is a standing policy for structural changes that require non-trivial code for
proper implementation, such as timezone fallback or alias mechanisms. These require the
existence of at least a prototype implementation that demonstrates correct function
according to the proposed specification.
3. Data Vetting Process
Once data for a country and language has been received, the data from the different
sources will be compared to show agreements and differences. Initial data contributions
are normally marked as "draft"; this will be changed once the data is vetted.
Note that there are two types of data in the repository:
a) Comparison data: The contributor can be an individual or an organization. Data from
a software platform or product, is normally gathered by calling public APIs, to ensure
that the data matches what is actually in use. In either case, the data is only for
comparison, and will not be changed except where necessary to update the data to
match the external source. The only requirement is that all changed data be versioned, and
the Version Numbering Scheme be used.
b) Common Data: The contents is decided upon by the CLDR-TC, following its procedures
and this process.
Data contributed to CLDR from different sources may be in conflict. For example, a
contribution on abbreviated month names may show each abbreviated name ending with a
period and another contribution for the same abbreviated month names may not show the
trailing period.
Members are encouraged to use local language and country contacts, inside and outside
their organization, to help vet current common data and any new proposals for addition or
amendment of common data. In particular, national standards organizations are encouraged
to be involved in the data vetting process.
All people involved in vetting data must be familiar with the
Data Formats and the requirements for
Filing Bug Reports. In particular, they should
compare any proposed changes against the data in the
Comparison Charts and indicate which platforms the proposed changes align with, or
whether they are different than all of the platforms.
The following procedure
will be used when resolving differences in submitted data.
There are multiple levels
of access and control
TC - (Technical
Committee members)
- Can vet and submit
data for all locales
- Can manage users in
their organization
- Can see the email
addresses for all vetters
Vetter (some
distinguished as experts)
- Can vet and submit
data for a particular set of locales (controlled by their TCs
- Cannot manage other
users.
- Can see the email
addresses for submitted data in their locales.
Guest
- Can vet and submit
data for a particular set of locales (controlled by their TCs
- Cannot manage other
users.
- Cannot see email
addresses.
A
locale group
is the set of all locales that share the same first field. For
example, en plus anything that starts with "en_". The group of
vetters and TCs who are authorized for a locale group is called the
interest group
for the locale.
Fields with multiple proposals where vetters and TCs have not all
voted for a single proposal are called
conflicted.
Correspondingly, a locale with any conflicted fields is a
conflicted locale;
a locale group with any conflicted locales is a
conflicted locale group.
The draft
attribute in the data can have 3 states:
- approved
: approved by
the technical committee (equals the CLDR 1.3 value of
false,
or an absent draft attribute). This does not mean that
the data is guaranteed to be error-free -- this is the best
judgment of the committee.
- provisional:
entered and confirmed by TC member or expert vetter.
Implementations may choose to accept the provisional data,
especially if there is no translated alternative.
- unconfirmed:
no confirmation available: entered by guest without TC
confirmation; or downgraded from provisional because of
disagreement) (equals CLDR 1.3 of
false)
In the vetting phase:
- Anyone with an
account (including submitters) can see a summary of all the
conflicted fields for a given locale group.
- Vetters can change
their preferred value (a vote).
- Anyone on the
interest group, plus submitters, is notified (batched daily) if
a vote is changed.
- For each conflicted
locale group, periodically (weekly at first, then daily) an
email is sent to the interest group summarizing all the
conflicted fields. The email will encourage the interest group
to resolve any conflicts. The interest group will also have
access to all the submitters' emails for the locale group, to
ask questions and resolve issues.
- Normally the "change
to" field is frozen during the
vetting process.
However, in case of a conflicted field this can be appealed to
the TC.
At the end of the
vetting process:
- All non-conflicted
fields are normally accepted (the TC can override), subject to a
quorum and the conflicting fields removed.
- The quorum is at
least one expert vetter, or at least 2 vetters from different
organizations.
- Conflicted fields
can be decided by the TC, or remain conflicted (and remain
provisional or unconfirmed).
This process can be
fine-tuned by the technical committee as needed to resolve any
problems that turn up.
There may be conflicting common practices or standards for a given country and
language. Thus LDML provides keyword variants to reflect the different practices. For
example, for German it allows the distinction between PHONEBOOK and DICTIONARY collation.
When there is an existing national standard for a country that is widely accepted in
practice, the goal is to follow that standard as much as possible. Where the common
practice in the country deviates from the national standard, or if there are multiple
conflicting common practices, or options in conforming to the national standard, or
conflicting national standards, multiple variants may be entered into the CLDR,
distinguished by keyword variants or variant locale identifiers.
Where a data item is identified as following a particular national standard (or other
reference), the goal is to keep that data aligned with that standard. There is, however,
no guarantee that data will be tagged with any or all of the national standards that it
follows.
The public can supply formal feedback into CLDR by filing a
Bug Report or Feature Request (RFE). Anyone can also
asked to be added to a list that will receive notification of new CLDR bugs, so they can
track issues if they want. Anyone will also be able to reply to any bug/rfe, to add
comments (this latter feature is not yet functional, but will be soon).
- To subscribe, send a note to "ecartis+unicode.org" (use an at sign instead of
the +) and put "subscribe cldr-bugrfe" in the subject line.
- To unsubscribe, put "unsubscribe cldr-bugrfe" in the subject line instead.
The email archive (in raw form) are in the
CLDR Email Archives.
Public Review Issues may be posted in
cases where broader public feedback is desired on a particular issue.
The public is also free to discuss locale issues on the Unicode list
unicode@unicode.org, which is devoted to Unicode
and related globalization standards (see
Mail Lists to subscribe).
People should, however, be aware that changes and updates to CLDR will only be taken in
response to complete, specific bug reports as described on
Filing Bug Reports. Discussion on public mailing
lists is not monitored; no actions will be taken in response to such discussion --
only in response to filed bugs. The process of checking and entering data takes time and
effort; so even when bugs/RFEs are accepted, it may take some time before they are in a
release of CLDR.
5. Data Release Process
The locale data is frozen per version. Once a version is released, it is never
modified. Any changes, however minor, will mean a newer version of the locale data being
released. The versioning scheme is x.y.z, where z is incremented for bug fixes, y is
incremented for any additions (such as new locale data or LDML elements), and x is
incremented for any major changes in format.
Early releases of a version of the common locale data will be issued as either alpha or
beta releases, available for public feedback. The dates for the next scheduled release
will be on CLDR Project.
The schedule milestones are:
Design |
All the proposed design changes have been accepted in place for changes in
structure, and tools. |
Structure, Tools, Documentation |
All the DTD and specification changes are made according to proposed design. The
tools are updated to support the new structure, including the survey tool for
displaying, collecting, and vetting data. |
Data Incorporation & Vetting |
Using the survey tool, additional data is collected. |
Beta |
Complete version for public review; may not yet incorporate all bug fixes |
Release |
Released, stable, referenceable version. |
6. Meetings and Communication
The currently-scheduled meetings are listed on the
Unicode Calendar. Meetings are
held by phone, every week at 8:00 Pacific Time (-08:00 GMT in winter, -07:00 GMT in
summer). Some meetings may be skipped if they conflict with holidays or other Unicode
meetings.
There is an internal email list for the Unicode CLDR Technical Committee (CLDR-TC),
open to Unicode members and invited experts. All national standards bodies who are
interested in locale data are also invited to become involved by establishing a
Liaison membership in the
Unicode Consortium, to gain access to this list.
Notification of the telephone numbers and passcode, and agenda, and any change in
schedule are sent out on the this email list.
7. Officers
The current Technical Committee Officers are:
- Chair: Mark Davis (IBM)
- Vice-Chair: Deborah Goldsmith (Apple)