ALPHA REVIEW UnicodeĀ® 17.0.0
The next version of the Unicode Standard will be Version 17.0.0, planned for release on
September 9, 2025. In addition to new characters, the plan
for this version includes one Unicode Technical
Standard and
updates several Unicode Standard Annexes.
This version will include significant new repertoire additions.
The planned repertoire adds a total of 4836 new characters. Note, however, that
during the alpha review period, the repertoire is not yet frozen. The main goal of the alpha
review is to ensure that all repertoire, including character names and glyphs, is correct and
appropriate for the final release.
An alpha version of the 17.0.0 Unicode Character Database files is available for public review.
We strongly encourage implementers to review the summary description,
download the alpha 17.0.0 Unicode Character Database files,
and test their programs with the new data, well before the end of the alpha period. It is especially important
to review the Notable Issues for Alpha Reviewers.
We encourage users to check the code charts carefully
to verify correctness of the new characters added to Unicode 17.0.0 and to ensure
that there are no regressions
in glyph shapes for previously encoded characters.
Unicode Standard Annexes (proposed updates)
If an annex is not listed, no proposed update is available for review yet. This
situation may occur when no significant change is planned for that annex for a particular
release.
Related Unicode Technical Standards (proposed updates)
In addition to the Unicode Standard proper, several Unicode Technical
Standards have significant text and data file updates that are
correlated with the new additions for Unicode 17.0.0. Review of that text
and data is also encouraged during the alpha review period.
Review and Feedback
For guidance on how to focus your review, see the section
Notable Issues for Alpha Reviewers.
Any feedback should be
reported using the contact form.
Comments on the Unicode Standard Version 17.0.0
or the Unicode Character Database data files should refer to the alpha review
Public Review Issue #514.
Comments on specific Version 17.0.0 UAXes and UTSes should refer to the respective
Public Review Issue Numbers
for each document, where available.
The comment period ends
April 2, 2025.
All substantive technical comments must have been received by that date for
consideration at the April UTC meeting. Editorial comments (typos,
etc.) may be still submitted after that date for consideration in further
editorial work.
Note: All alpha files may be updated, replaced, or
superseded by other files at any time. The alpha files will be
discarded once Unicode 17.0.0 is final. It is inappropriate to cite
these files as other than a work in progress. No
products or implementations should be released based on the alpha
UCD data files—use only the final, approved Version 17.0.0 data
files, expected on September 9, 2025.
The Unicode Consortium provides early access to updated versions of the data files
and text to give reviewers and developers as much time as possible to ensure a problem-free adoption of
Version 17.0.0.
Notable Issues for Alpha Reviewers
The focus of this alpha review is the new repertoire planned for encoding.
Reviewers should concentrate on the new repertoire shown in the code charts,
verifying that it is appropriate and complete, and that new character names and
glyphs are correct.
This list of notable issues briefly mentions other aspects of the 17.0
release, to provide more context. Further details regarding updates to
annexes, UTSes, and data files will be provided during beta review.
Changes to Unicode Standard Annexes
Some of the Unicode Standard Annexes have modifications for
Unicode 17.0.0, often in coordination with changes to character properties.
See the Modifications section of each Annex for details of the relevant changes.
Changes to Synchronized Unicode Technical Standards
- UTS #10: The test data documentation, formerly specified in CollationTest.html, has
been moved to a new section inside UTS #10.
- UTS #58: This is proposed as a new UTS, synchronized with Unicode 17.0. It specifies rules
for linkifying URLs embedded in text.
Core Specification Update
The alpha review draft core specification is available as per-chapter web pages.
Reviewers should carefully check for inadvertent changes in the text, in particular in glyph examples.
The text still contains a number of editor's notes, indicating both general information for
reviewers and spots in the text that are not yet complete for Unicode 17.0. Please use those
notes as guidance, as there is no need for repeated feedback reports regarding omissions or defects that the editors already know about and are actively working on.
Script-specific Issues
There are five new scripts encoded in Unicode 17.0. Some of these scripts,
such as Tai Yo, have complex layout.
Numeric Property Issues
- There are two new sets of decimal digits added in Unicode 17.0,
for newly encoded scripts: Tolong Siki and Chisoi. Implementations of
numeric values and numeric formatting
should take these new sets into account.
Unihan-related Issues
All Unihan
properties should be reviewed carefully. The following changes
deserve special attention:
- A large new extension block has been added: CJK Unified Ideographs Extension J.
- Five urgently needed characters (UNC) have been added at the end of the CJK Extension C block,
so implementers should check any hard-coded assumptions about the range of Extension C.
- New source prefixes have been added to the Unihan database for values for
the kIRG_GSource, kIRG_SSource, and kIRG_TSource properties.
See UAX #38 for further details on these changes, especially Section 4.2, Listing
by Date of Addition to the Unicode Standard, and Section 4.3, Listing by
Location within Unihan.zip.
Standardized Variation Sequences
- Four variation sequences have been added for quotation marks (U+2018, U+2019,
U+201C, U+201D) to deal with vertical layout considerations in Sibe (Mongolian) text.
Code Charts
As always, careful review of the updated code charts for Version 17.0.0 is advised.
Particular issues to take note of include:
Collation-related Issues
The Default Unicode Collation Element Table (DUCET) was updated to the Unicode 17.0.0
repertoire for UCA 17.0.0. For the most part, the additions for new
characters are unremarkable, but implementations should be checked to ensure
the new additions do not cause problems.
IDNA-related Issues
The listing of the IDNA2008_Category property in Idna2008.txt has been updated to reflect
the planned Unicode 17.0.0 repertoire. Implementers concerned with the stability
of IDNA 2008 should check that data carefully to verify it meets their expectations.
New Data Files
There are no new data files in the UCD for Version 17.0.0.
Data Directory Structure
It is important to note that the directory structure for the UCD
is being updated somewhat for Version 17.0.0. In particular, the data
directories associated with synchronized UTSes starting with Version
17.0.0 have been moved under
the same versioned directory as the UCD proper, rather than being located
at the top level of the /Public directory. This means that the final
release directory for the UCD and data for the synchronized UTSes will
have the same structure as the /Public/draft/ directory used during
alpha and beta review. No data files for prior releases will
be moved — all existing release links are permanently stable.
More details regarding the organization of the data files for Version 17.0.0 will be
available during beta review, and in the various proposed updates for
UAXes and UTSes.
General Issues
For current proposed updates to the particular UAXes, see
Proposed Updates for Standard Annexes.
Particular issues in the UAXes may also be the focus of specific
Public Review Issues.
Each proposed textual change in a UAX is highlighted, so that you can focus
your review on those sections if you have limited time. The changes
are also listed in detail in the Modifications sections (linked from the table
of contents of each document), and are summarized in
UAX changes,
so you can check on those areas that might be of most
interest.
Some links between alpha documents and the proposed
updates for UAXes will not work correctly during the
alpha review period. This is a known problem which does
not need to be reported, as such links point to
the eventual final names or revision numbers for the
released versions.
Note that all links to versioned data files on this alpha review page
are using "/Public/17.0.0/" links. During alpha review, those links are
redirected to the actual development directory at "/Public/draft/". Once
Unicode 17.0.0 has been released, all of these links will then point to
the final permalinks for Unicode 17.0.0 data, so that this alpha review
page for Unicode 17.0.0 does not end up anomalously pointing to draft data
directories for future release development.
Stability
Certain character properties for newly assigned characters cannot be
changed after the formal release of each version of the standard, because of the
Character Encoding Stability Policy.
Such character property values need special attention during the alpha and beta review process, as they
cannot be corrected after publication. These include:
- Any property affecting Unicode Normalization, including Decomposition_Mapping, Canonical_Combining_Class, and Composition_Exclusion.
- The determination of whether a character is included in identifiers (XID_Start, XID_Continue).
- Case foldings.
- There are also strong constraints on additions and changes to case mappings.