BETA Unicode 6.0.0
The next version of the Unicode Standard will be Version 6.0.0, planned for release on
October 11, 2010. A beta version of the 6.0.0 Unicode Character Database files is available for public comment. We strongly encourage implementers to review the summary description, download the beta 6.0.0 Unicode Character Database files, and test their programs with the new data, well before the end of the beta period. Beta code charts are also available for review. We encourage users to check the code charts carefully to verify correctness of the new characters added to Unicode 6.0 and to ensure that there are no regressions in glyph shapes for previously encoded characters.
The Version 6.0 draft of Chapter 3, Conformance is also posted for
review. Please note that definitions D110 and D111 have revised wording. Users of Unicode should take advantage of this opportunity to
provide any feedback.
Summary description |
Unicode 6.0.0 |
Data files |
http, ftp |
Summary of beta charts |
Readme.txt |
Single-block charts with yellow highlighting for new characters |
http, ftp |
Single block charts for all of Unicode 6.0 |
http, ftp |
Code charts - single download, 95MB |
http, ftp |
Related Unicode Technical Standards
In addition to the Unicode Standard proper, two other Unicode Technical
Standards have significant text and data file updates that are
correlated with the new additions for Unicode 6.0. Review of that text
and data is also encouraged during the beta review period.
Unicode Collation Algorithm (UCA) |
|
Unicode IDNA Compatibility Processing |
|
Review and Comments
For guidance on how to focus your review, see the section
Notable Issues for Beta Testers below.
Any comments on the beta Unicode 6.0.0, the UCD 6.0.0, or the
6.0.0 UAXes and UTSes should be
reported using the Unicode
reporting form, referring to
Public Review
Issue #170. The comment period ends
August 2, 2010.
All substantive comments must have been received by that date for
consideration at the August UTC meeting. Editorial comments (typos,
etc.) may be still submitted after that date for consideration in the final
editorial work.
Note: All beta files may be updated, replaced, or
superseded by other files at any time. The beta files will be
discarded once Unicode 6.0.0 is final. It is inappropriate to cite
these files as other than a work in progress. No
products or implementations should be released based on the beta
UCD data files -- use only the final, approved Version 6.0.0 data
files, expected on October 11, 2010.
The Unicode Consortium provides early access to updated versions of the data files
and text to give reviewers and developers as much time as possible to ensure a problem-free adoption of
Version 6.0.0.
The assignment of characters for Unicode 6.0.0 is now stable. There will be no further additions or modifications of code points.
One of the main purposes of the beta review period, however, is to verify and
correct the preliminary character property assignments in the Unicode Character
Database. Reviewers should check for property changes to existing Unicode 5.2.0
characters, as well as the property values for the new Unicode 6.0.0 character
additions. To facilitate verification of the property changes and additions, diffable XML versions of the Unicode Character Database are available. These XML
files are dated, so that people can check the details of changes that occurred
during the beta review period. The XML
files are in the http://www.unicode.org/Public/6.0.0/diffs/ directory. For more information,
see the
diffs.readme.txt
file.
The beta review period is a good opportunity to add support for the new
Unicode 6.0.0 characters in internal versions of software, so that software can
be tested to verify that the new characters and property assignments don't cause
problems when upgraded to Version 6.0.0 of Unicode.
Notable Issues for Beta Testers
All Unicode Standard Annexes are being modified in
Unicode 6.0.0, and often in coordination with changes in properties. To see the
current proposed updates to the particular UAXes, see
Proposed Updates for Standard Annexes.
Particular issues in the UAXes are also the focus of specific
Public Review Issues.
Each proposed change in a UAX is highlighted, so that you can focus
your review on those sections if you have limited time. The changes
are also listed in each Modifications section (linked from the table
of contents), so you can check on those areas that might be of most
interest. Some links between beta documents and the proposed
updates for UAXes will not work correctly during the
beta review period. This is a known problem which does
not need to be reported, as such links are links to
the eventual final names or revision numbers for the
released versions.
The following blocks are new in Unicode 6.0. Check implementations
carefully for any range or property value assumptions regarding
these new blocks—particularly for the new CJK Extension D.
Block Name | Range |
Alchemical Symbols | U+1F700..U+1F77F |
Bamum Supplement | U+16800..U+16A3F |
Batak | U+1BC0..U+1BFF |
Brahmi | U+11000..U+1107F |
CJK Unified Ideographs Extension D | U+2B740..U+2B81F |
Emoticons | U+1F600..U+1F64F |
Ethiopic Extended-A | U+AB00..U+AB2F |
Kana Supplement | U+1B000..U+1B0FF |
Mandaic | U+0840..U+085F |
Miscellaneous Symbols And Pictographs | U+1F300..U+1F5FF |
Playing Cards | U+1F0A0..U+1F0FF |
Transport And Map Symbols | U+1F680..U+1F6FF |
Please also check the following carefully:
- the new emoji symbols, especially important for mobile phones, and the related new data file, EmojiSources.txt, which maps the emoji symbols to their original Japanese telco source sets
- the 222 new CJK Unified Ideographs in common use in China and Japan
- the two new provisional properties for support of Indic scripts: IndicMatraCategory and IndicSyllabicCategory
- the new provisional script extension data for use in segmentation, regular expressions, and spoof detection
- the new CJK Compatibility charts.
- The UAX changes (http://www.unicode.org/versions/Unicode6.0.0/#UAX_Changes)