[Unicode]   Technical Reports Home | Site Map | Search
 

About Unicode Technical Reports

Unicode Technical Reports cover a wide range of topics related to the implementation or development of the Unicode Standard. These include topics such as:

  • normalizing Unicode text for comparison and storage
  • collating (sorting) Unicode strings
  • determining line break opportunities or other segmentation boundaries in text
  • regular expression syntax extensions for Unicode text
  • compressing Unicode text

These reports are normatively referenced by a number of international standards and by a wide range of products.

For a categorized list of the Unicode specifications, including specifications defined in the Unicode Technical Reports and those defined in other locations in the Unicode Standard, see the Specifications FAQ.

Types of Unicode Technical Reports: UAX, UTS, UTR

There are three types of technical reports, based on the authoritative status of the document:

A Unicode Standard Annex (UAX) forms an integral part of the Unicode Standard, but is published as a separate document. The Unicode Standard may require conformance to normative content in a Unicode Standard Annex, if so specified in the Conformance chapter of that version of the Unicode Standard. The version number of a UAX document is always the same as the version of the Unicode Standard of which it forms a part.

A Unicode Technical Standard (UTS) is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS.

A Unicode Technical Report (UTR) contains informative material. Conformance to the Unicode Standard does not imply conformance to any UTR. Other specifications, however, are free to make normative references to a UTR.

As technical reports, including UAXes and UTSes, are developed, the Unicode Technical Committee approves the posting of proposed updates or preliminary versions for public review. Publication of these draft versions does not imply endorsement by the Unicode Consortium.

A Proposed Update of a UAX, UTS, or UTR contains the draft of a proposed modification of an already published UAX, UTS, or UTR.

A Draft Unicode Technical Report (DUTR) has the basic structure and content required for a new technical report, but has not yet received final approval for publication.

A Proposed Draft Unicode Technical Report (PDUTR) is in an early stage of development.

Any technical report that has Proposed Update, Draft, or Proposed Draft, status is a preliminary document which may be updated, replaced, or superseded by other documents at any time. Such documents are not stable specifications; it is inappropriate to cite them as other than works in progress. Their status is always clearly indicated in the document.

Development Process

Technical reports are created by the Unicode Consortium Technical Committees (UTC, CLDR-TC, and ULI-TC) following open, consensus-oriented processes. For more information about the approval process, see the FAQ on the Technical Reports Development Process. When appropriate, the Public Review Issues page solicits review and feedback on initial drafts for or updates to technical reports.

UTS #10, Unicode Collation Algorithm, has an additional set of policies governing the maintenance of the basic data table used in assigning collation weights to characters. See Change Management for the Unicode Collation Algorithm and UCA Default Table Criteria for New Characters.

Versioning

Each technical report has a unique and persistent report number that is part of its title. For example, in UTS #10, Unicode Collation Algorithm, the "10" permanently identifies that specification, and never changes as the document is updated. Each technical report also has a revision number, which is then used to track and identify each proposed update and each approved publication of the document. UAXes and UTSes have a separate version number, in addition to their revision number. The details of the meaning and assignment of version numbers for those types of technical reports are specified below.

For information about citing versions of technical reports, see Versions of the Unicode Standard.

Revision Numbering

Uniform and persistent revision numbers are used for all technical reports. This revision number is incremented and a new URL reflecting that revision number is provided each time the file content is altered materially. Modifications to the report are summarized in the change history section of each document.

Revision numbers are reflected directly in the permanent, versioned file names used in the URL for the document. Thus, Revision 30 of UTS #10 is named .../reports/tr10/tr10-30.html, while the earlier Revision 28 of the same technical report is named .../reports/tr10/tr10-28.html, and so on. There were some departures from this scheme for very early publications, but this naming convention is now followed for all technical reports.

Because revision numbers use whole numbers, rather than a major.minor.update version syntax, some technical reports have large revision numbers. Many of the revision number changes, however, reflect minor editorial changes to the documents, as opposed to substantive changes to their contents.

Minor editorial corrections such as fixing a broken link may be made without assigning a new revision number. In such cases the date in the report header will, however, be updated, to indicate that a micro-edit has occurred.

Revision Back Link Trail

Each technical report has links to the previous approved revision of the report and to the latest approved revision of the report, allowing readers to find and cite a particular revision. The back links to previous approved revisions can be followed all the way back to the initial drafts of the documents, if so desired, allowing examination of the complete history of the specification.

Proposed update revisions of documents are not included in the back link trail of previous approved revisions, but any specific proposed update can still be accessed on the Unicode website by using the relevant revision number of that proposed update in the URL for the document.

Version Numbers for UAXes

Because each UAX is formally a part of version of the Unicode Standard, it is given a version number in addition to its revision number. The version number always matches the version of the Unicode Standard that it constitutes a part of, and the version number reflects the same major.minor.update format of the Unicode Standard. For details regarding how major.minor.update versioning is used for the Unicode Standard, see About Versions of the Unicode Standard.

Version Numbers for UTSes

UTSes also have a version number in addition to their revision number, but the conventions for assignment of version numbers for UTSes differ somewhat from UAXes.

In some cases UTSes use major.minor format version numbers to distinguish minor updates of the documents from major changes in the specification. Such version numbers apply only to the UTS in question and are not synchronized with versions of the Unicode Standard.

In other cases a UTS may have associated data which is maintained in synchrony with repertoire additions to the Unicode Standard. In those cases, the UTS may be given a major.minor.update version number which matches the version of the Unicode Standard reflected in the data files.

Over time, a particular UTS may change its maintenance status, and change from development that is not synchronized with releases of the Unicode Standard to development which is synchronized with those releases. When this happens, the version numbering scheme for the specification changes as well. For example, UTS #39, Unicode Security Mechanisms, was first published as Version 2, then Version 3, but changed to Version 6.3.0 when its maintenance mode changed to synchronize with the Unicode Standard, Version 6.3.0.

Data Files for Technical Reports

Data files associated with UAXes are formally a part of the Unicode Character Database (UCD), an integral part of each version of the Unicode Standard. Such data files are updated for each release of the Unicode Standard. Often there are no substantive changes required for particular data files, but the versions are bumped and new files are published, so that implementers have a complete set of data for each release.

Some UTSes are also associated with data files. If the UTS is maintained in synchrony with the Unicode Standard, then its data files are also updated with each release, and the naming conventions for the UTS data directories also reflect the version numbering. However, such data files are not formally a part of the Unicode Character Database.

UTRs may also have associated data files. In such cases, because a UTR has no version number distinct from its revision number, the associated data files are published in a data directory which reflects the revision number. A new revision data directory is created each time the UTR itself is updated, but such revisions are not synchronized with releases of the Unicode Standard.

Data files for UTSes or UTRs are maintained in separate folders under http://www.unicode.org/Public/. The location of each set of data files is documented in the corresponding UTS or UTR. Each folder contains a complete set of data files for that version of the document.

Stable References to Sections of Technical Reports

The UAXes, UTSes, and some of the UTRs have stable HTML anchors defined for section headers. These enable direct links to those sections. More recent versions of technical reports have stable HTML anchors for tables, figures, each formal rule or definition, and the modification history of the document. For example:

Superseded or Withdrawn Reports

Occasionally, the material of a report is incorporated into another document, for example UAX #13 Newline Guidelines became Section 5.8, Newline Guidelines in the Core Specification of the Unicode Standard as of Version 4.0. Such reports are considered superseded and are listed in their own section on the Technical Reports page.

Instead of a link to the latest approved version, the reports page has a link to a page explaining the change in status (Example). Where applicable, that page also provides information on where the material was incorporated. A similar page is provided for reports which have been formally withdrawn. The numbers of superseded or withdrawn reports are never reused.

Old Versions of the Unicode Standard

Prior to 2003, several minor versions of the Unicode Standard were published as Unicode Technical Reports or as Unicode Standard Annexes. Such reports are listed in their own section on the Technical Reports page. Instead of a direct link to the last approved document for those minor versions, the reports page has a link to a page providing a summary for that version of the Unicode Standard. (Example)

Errata

Errata to technical reports and other specifications may be posted on the Updates and Errata page. To report errors in published documents, such as the Unicode Standard or technical reports, use the Unicode Consortium's contact form.


Access to Copyright and terms of use