About the Unicode Consortium Stability Policy
Unlike many other standards, the
Unicode Standard is continually
expanding: new characters are added to meet a variety of uses, ranging from
technical symbols to letters for archaic languages. Character properties
are also expanded or revised to meet implementation requirements. However,
as the Unicode Standard becomes more widely deployed, changes to the
standard must be constrained by the requirements of backward
compatibility. To that end, the Unicode Consortium
Character Encoding Stability Policy limits the ways in which the standards developed by the Unicode
Consortium can change.
A primary requirement of stability is that the identity of the
character remains unchanged in all future versions of the Unicode Standard.
In other words, the same sequence of character codes continues to represent
the same text. Consequently, character codes must never be changed, and the
properties and behavior of a character must not change to the extent that
it affects the identity of the character. In addition, code sequences,
once normalized, must remain normalized.
Additional guarantees restricting possible changes may be added to enable
implementers to make safe assumptions that allow more efficient and
compact implementations. For example, by limiting the possible distinct
values of the General Category, implementations may safely choose a packed
format for representing them.
Character names are immutable, so that they can be used as constant
external references to Unicode characters, in order to synchronize
identifiers for characters among standards, in particular ISO standards.
In an ideal world, the information about existing Unicode Characters would
be complete and correct at inception, so that maintenance of the standard
would be purely additive. However, due to the large number of characters,
each associated with many character properties, this ideal cannot be
achieved. Despite best efforts, clerical errors are introduced in the
publication process, but there are also cases where the initial information
about a character may later prove incorrect or incomplete. In both cases,
corrections may be required. If these can be made without implying a change
to a character's identity, it is usually more beneficial to allow the change
than to freeze the mistake, and the stability policy reflects that.
Even if a proposed correction is not prohibited by the stability policy, it
must undergo an explicit approval process in the
Unicode Technical Committee, including an analysis of its costs and benefits.
In cases where the stability policy prevents a change, the UTC may take one
of several actions:
- add additional characters with the desired properties and behavior
- add additional properties
- provide documentation of the un-modifiable mistake
- add a character annotation or descriptive text in the standard.
Occasionally, two separately encoded characters may prove to be
unintentional duplicates of each other. In such cases, stability prevents
removal of the duplicate character as this would impact existing data using
it. Instead, the character may be deprecated, which retains its definition
and properties, but strongly discourages its usage.