[Unicode]  The Unicode Standard Home | Site Map | Search
 

BETA Unicode 6.3.0

The next version of the Unicode Standard will be Version 6.3.0, planned for release in September 2013. The major feature of this release is a very substantial enhancement of the Unicode Bidirectional Algorithm, which potentially affects all text displayed in the Arabic and Hebrew scripts. Five new format control characters are added to the standard as part of the changes for the Unicode Bidirectional Algorithm. The Unicode Collation Algorithm has also been changed in ways that will affect many string comparisons.

A beta version of the 6.3.0 Unicode Character Database files is available for public review. We strongly encourage implementers to review the summary description, download the beta 6.3.0 Unicode Character Database files, and test their programs with the new data, well before the end of the beta period. It is especially important to review the Notable Issues for Beta Reviewers.

We encourage users to check the code charts carefully to verify correctness of the new characters added to Unicode 6.3.0 and to ensure that there are no regressions in glyph shapes for previously encoded characters.

Summary description Unicode 6.3.0
Unicode character database (UCD) httpftp
Summary of beta charts Readme.txt
Single-block charts with yellow highlighting for new characters delta charts
Single block charts for all of Unicode 6.3.0 httpftp
Code charts - single download (95MB) httpftp

Related Unicode Technical Standards

In addition to the Unicode Standard proper, two other Unicode Technical Standards have significant text and data file updates that are correlated with the new additions for Unicode 6.3.0. Review of that text and data is also encouraged during the beta review period. 

Review and Feedback

For guidance on how to focus your review, see the section Notable Issues for Beta Reviewers.

Any feedback should be reported using the contact form. Comments on the Unicode Standard Version 6.3.0 or the Unicode Character Database data files, should refer to the beta review Public Review Issue #249. Comments on specific Version 6.3.0 UAXes and UTSes should refer to the respective Public Review Issue Numbers for each document.

The comment period ends July 22, 2013. All substantive technical comments must have been received by that date for consideration at the August UTC meeting. Editorial comments (typos, etc.) may be still submitted after that date for consideration in the final editorial work.

Note: All beta files may be updated, replaced, or superseded by other files at any time. The beta files will be discarded once Unicode 6.3.0 is final. It is inappropriate to cite these files as other than a work in progress. No products or implementations should be released based on the beta UCD data files -- use only the final, approved Version 6.3.0 data files, expected in September 2013.

The Unicode Consortium provides early access to updated versions of the data files and text to give reviewers and developers as much time as possible to ensure a problem-free adoption of Version 6.3.0.

The assignment of characters for Unicode 6.3.0 is now stable. There will be no further additions or modifications of code points and no further changes to character names. Please do not submit feedback requesting changes to code points or character names for Unicode 6.3.0, as such feedback is not actionable.

One of the main purposes of the beta review period is to verify and correct the preliminary character property assignments in the Unicode Character Database. Reviewers should check for property changes to existing Unicode 6.2.0 characters, as well as the property values for the new Unicode 6.3.0 character additions.

To facilitate verification of the property changes and additions, diffable XML versions of the Unicode Character Database are available. These XML files are dated, so that people can check the details of changes that occurred during the beta review period. The XML files are in the http://www.unicode.org/Public/6.3.0/diffs/ directory. For more information, see the diffs.readme.txt file.

The beta review period is a good opportunity to add support for the new Unicode 6.3.0 characters in internal versions of software, so that software can be tested to verify that the new characters and property assignments do not cause problems when upgraded to Version 6.3.0 of Unicode.

Notable Issues for Beta Reviewers

Some of the Unicode Standard Annexes have substantial modifications for Unicode 6.3.0, often in coordination with changes to character properties. Most notably for Unicode 6.3.0, UAX #9 has been modified very extensively to extend the Unicode Bidirectional Algorithm to handle isolate spans and to do parenthesis matching. In particular, the parenthesis matching will change the display of existing text under certain conditions.

Please also check the following specific items carefully:

  • The Bidi_Class property has been extended with new values to cover four new bidirectional format control characters for isolate spans.
  • A new implicit bidirectional format control character, U+061C ARABIC LETTER MARK (ALM), has been added to improve number formatting in an Arabic context.
  • Two new bidi-related properties have been added in a new data file, BidiBrackets.txt, to support parenthesis matching in the Bidirectional Algorithm.
  • There have been substantial additions to BidiTest.txt, and a new conformance test file, BidiCharacterTest.txt, has been added, in support of testing the new extensions to the Bidirectional Algorithm.
  • The General_Category of U+180E MONGOLIAN VOWEL SEPARATOR has been changed from Zs to Cf. This may impact implementations of Mongolian.
  • There are additional property changes listed in UAX #44, Unicode Character Database that may affect some implementations.
  • Informative property values in the Unihan database for many CJK unified ideographs have been changed.
  • UCA default data has changed significantly. Check for differences in decimal numbers and in compatibility characters with casing or with variant tertiaries.
  • The UCA primary weights from 0xFFFD to 0xFFFF are now reserved for special collation elements, so ensure that implementations are handling these values properly.

General Issues

For current proposed updates to the particular UAXes, see Proposed Updates for Standard Annexes or use the links in the navigation bar on this page. Particular issues in the UAXes may also be the focus of specific Public Review Issues. Each proposed textual change in a UAX is highlighted, so that you can focus your review on those sections if you have limited time. The changes are also listed in detail in the Modifications sections (linked from the table of contents of each document), and are summarized in UAX changes, so you can check on those areas that might be of most interest.

Some links between beta documents and the proposed updates for UAXes will not work correctly during the beta review period. This is a known problem which does not need to be reported, as such links are links to the eventual final names or revision numbers for the released versions.

Stability

Certain character properties for newly assigned characters cannot be changed after the formal release of each version of the standard, because of the Character Encoding Stability Policy. Such character property values need special attention during the beta review process, as they cannot be corrected after publication. These include:

  • Any property affecting Unicode Normalization, including Decomposition_Mapping, Canonical_Combining_Class, and Composition_Exclusion.
  • The determination of whether a character is included in identifiers (XID_Start, XID_Continue).
  • Case mappings and case foldings.