Unicode® 13.0.0 (DRAFT)
This page summarizes the important changes for the Unicode Standard, Version 13.0.0.
This version will supersede all previous versions of the Unicode Standard.
|The Unicode Character Database, Code Charts, and Annexes for Version 13.0
will be released on
March 10, 2020. The core specification (the PDF chapters) of Version 13.0 is still pending publication due to the extensive editorial work required for the new content additions. Until final publication, the links to individual chapters of the core specification will not be activated. An announcement will be made when the core specification for Version 13.0 is available. In the meantime, implementers can continue to reference the relevant sections of the most recent version of the core specification.
B. Technical Overview
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Changes in the Unicode Character Database
G. Changes in the Unicode Standard Annexes
H. Changes in Synchronized Unicode Technical Standards
M. Implications for Migration
Unicode 13.0 adds 5,930 characters,
for a total of 143,859 characters.
These additions include 4 new scripts,
for a total of 154 scripts, as well as 55 new emoji characters.
The new scripts and characters in Version 13.0 add support for lesser-used languages
and unique written requirements worldwide, including numerous symbols additions.
Funds from the Adopt-a-Character program provided support for some of these additions. The new scripts and characters include:
- Yezidi, historically used in Iraq and Georgia for liturgical purposes,
with some modern revival of usage
- Chorasmian, historically used in Central Asia across Uzbekistan, Kazakhstan,
and Turkmenistan to write a now extinct Eastern Iranian language
- Dives Akuru, historically used in the Maldives until the 20th century
- Khitan Small Script, historically used in Northern China
- Arabic script additions used to write Hausa in Africa, and other
additions used to write Hindko and Punjabi in Pakistan
- A character used in Syloti Nagri in South Asia
- Bopomofo additions used for Cantonese
Popular symbol additions:
- 55 emoji characters. For complete statistics regarding all emoji as of Unicode 13.0, see Emoji Counts. For more information about emoji additions in version 13.0, including new emoji ZWJ sequences and emoji modifier sequences, see Emoji Recently Added, v13.0.
Other symbol additions include:
- Creative Commons license symbols that are used to describe functions, permissions, and concepts related to intellectual property that have extensive use on the web
- Vietnamese reading marks that mark ideographs as having a distinct, colloquial reading.
- 214 graphic characters that provide compatibility with various home computers from the mid-1970s to the mid-1980s and with early teletext broadcasting standards
Additional support for lesser-used languages and scholarly work was extended worldwide, including:
- A character used in Sinhala to write Sanskrit
Important glyph corrections, including:
Several other important Unicode specifications have been updated for Version 13.0.
The following four Unicode Technical Standards are versioned in
synchrony with the Unicode Standard, because their data files cover the same repertoire.
All have been updated to Version 13.0:
Some of the changes in Version 13.0 and associated Unicode Technical Standards
may require modifications
to implementations. For more information, see the migration and modification sections of
UTS #10, UTS #39, UTS #46, and UTS #51.
This version of the Unicode Standard is also synchronized with
ISO/IEC 10646:2020, sixth edition,
plus the following additions:
See Sections D through H below for additional details regarding the changes in this version of
the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.
Version 13.0 of the Unicode Standard consists of:
- The core specification
- The code charts (delta and archival) for this version
- The Unicode Standard Annexes
- The Unicode Character Database (UCD)
The core specification gives the general principles,
requirements for conformance, and guidelines for implementers. The
code charts show representative glyphs for all the Unicode
characters. The Unicode Standard Annexes supply detailed normative
information about particular aspects of the standard. The Unicode
Character Database supplies normative and informative data for
implementers to allow them to implement the Unicode Standard.
The core specification is available as
a single pdf for viewing.
Links are also available
in the navigation bar on the left of this page to access
individual chapters and appendices
of the core specification.
Several sets of code charts are available. They serve different
- The latest set of code charts for the Unicode Standard is available online. Those charts are always the most current code charts available, and may be updated at any time. The charts are organized by scripts and blocks for easy reference. An online index by character name is also provided.
For Unicode 13.0.0 in particular two additional sets of code chart pages are provided:
- A set of delta code charts showing the
new blocks and any blocks in which characters were added for Unicode 13.0.0. The new characters are visually highlighted in the charts.
- A set of archival code charts that represents
the entire set of characters, names and representative glyphs at the time of publication of Unicode 13.0.0.
The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.
Links to the individual
Unicode Standard Annexes are available in
the navigation bar on the left of this page. The list of significant changes
in the content of the Unicode Standard Annexes for Version 13.0 can be found
in Section G below.
Data files for Version 13.0 of
the Unicode Character Database are available. The ReadMe.txt in that directory provides a roadmap
to the functions of the various subdirectories.
Zipped versions of the UCD
for bulk download are available, as well.
Version 13.0.0 of the Unicode Standard
should be referenced as:
The Unicode Consortium. The Unicode Standard, Version 13.0.0, (Mountain View, CA: The Unicode Consortium,
2020. ISBN 978-1-936213-26-9)
The terms “Version 13.0” or “Unicode 13.0” are abbreviations for the full version reference, Version 13.0.0.
The citation and permalink for the latest published version of the Unicode Standard is:
The Unicode Consortium. The Unicode Standard.
A complete specification of the contributory files for Unicode
13.0 is found on the page Components for 13.0.0.
That page also provides the recommended reference format for Unicode Standard Annexes. For examples of how to cite particular portions of the Unicode Standard, see also the Reference Examples.
Errata incorporated into Unicode 13.0 are listed by date in
a separate table. For corrigenda and errata after the release of Unicode 13.0, see the list of current
Updates and Errata.
There were no significant changes to the Stability Policy of the core specification between Unicode 12.1 and Unicode 13.0.
scripts were added with accompanying new block descriptions:
|Khitan Small Script
Changes in the Unicode Standard Annexes are listed in Section G.
Character Assignment Overview
5,930 characters have been added.
Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, see delta code charts.
There are no significant new conformance requirements in Unicode 13.0.
The detailed listing of all changes to the contributory data files of the Unicode Character Database
for Version 13.0 can be found in
UAX #44, Unicode Character Database.
The changes listed there include character additions and property revisions to existing characters that will affect implementations.
Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in
In Version 13.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section
of each UAX, linked directly from the following list of UAXes.
|Unicode Standard Annex
Unicode Bidirectional Algorithm
|No significant changes in this version.
East Asian Width
|No significant changes in this version.
Unicode Line Breaking Algorithm
|Rule LB22 was changed to simply disallow breaking before ellipsis.
Rule LB30 was changed to exclude full-width CP and OP.
Unicode Normalization Forms
|The explanation of script-specific exclusions was updated in Section 5.1,
Composition Exclusion Types.
Unicode Script Property
|Yezi was added to the scx set listed for U+060C.
Unicode Text Segmentation
|A number of adjustments were made to values in Table 3, Word_Break Property Values.
For consistency, prepended concatenation marks were omitted from the definition of
Control in Table 2, Grapheme_Cluster_Break Property Values. Unnecessary
external references to UTS #51 were removed.
Unicode Identifier and Pattern Syntax
|A qualification was added to the example under UAX31-R1. Default Identifiers.
Table 4 was retitled to "Excluded Scripts". Four new scripts were added to Table 4.
Rows in Table 4 dedicated to character exclusions not directly associated with
scripts were removed from the table, and those exclusions were moved to the
derivation rules associated with UTS #39, Unicode Security Mechanisms.
Unicode Named Character Sequences
|No significant changes in this version.
Unicode Han Database (Unihan)
|The regular expressions for most of the existing IRG Source fields were updated.
Documentation was added for new fields: kIRG_SSource, kIRG_UKSource, kTGHZ2013,
kUnihanCore2020, and kSpoofingVariant. Documentation was removed for the obsolete fields: kRSJapanese,
kRSKanWa, and kRSKorean. The format of the tables in Sections 4.2, 4.3, and 4.4
was revised for legibility, and a "Count" column was added to the tables in
Common References for Unicode Standard Annexes
|All references were updated for Unicode 13.0.
Unicode Character Database in XML
|New code point attributes, values, and patterns were added.
Unicode Character Database
|Documentation was added for emoji properties. Documentation of the new ccc=6 value
was added in Table 15. The Khitan Small Script was added to the list of scripts
whose Name property is derived by rule. A note was added indicating that code
point labels are included in the scope of the matching rule UAX44-LM2. There were
also numerous other small editorial improvements to the text.
|A table was added summarizing U-source prefixes. The "UCI" prefix was marked
as obsolete. The semantics of the "UK" prefix were clarified. References to
SAT-sourced ideographs were removed. A "Comp" value was added
to the list of possible status values. UNC-2013 and UNC-2015 status values were removed.
A description of the radical-stroke charts associated with the U-Source ideographs
Unicode Vertical Text Layout
|Section 3.2 was significantly reorganized, with new content added regarding
layout issues for squared Katakana and ideographic words. Horizontal and vertical glyphs were
added for U+32FF SQUARE ERA NAME REIWA.
There are also significant revisions in the Unicode Technical Standards whose
versions are synchronized with the Unicode Standard. The most important of these changes are listed below.
For the full details of all changes, see the Modifications section
of each UTS, linked directly from the following list of UTSes.
|Unicode Technical Standard
Unicode Collation Algorithm
|Khitan Small Script and the new Tangut Supplement block were added to the
specification for computing implicit weights in Table 16.
Unicode Security Mechanisms
|This update systematically corrected various citations of "IdentifierType", "Identifier Type"
and "Type" to use "Identifier_Type" consistently, and similarly for "Identifier_Status".
Definitions of Identifier_Type values were clarified in Table 1.
Unicode IDNA Compatibility Processing
|No significant changes in this version.
|A new section was added on how to use ZWJ sequences to change the color of base emoji,
to represent such emoji as black cat. Five characters were removed from the explicit
gender table, since they were made gender-neutral. RGI sequences were added, showing more skin tone
combinations for people holding hands. A definition was added for emoji component.
Color was added to the specification of the order of elements in emoji ZWJ sequences.
There are a significant number of changes in Unicode 13.0 which may impact implementations upgrading to Version 12.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.
Four new scripts have been added in Unicode 13.0.0. Some of these scripts have
particular attributes which may cause issues for implementations. The more
important of these attributes are summarized here.
General Character Property Changes
Standardized Variation Sequences
New Data Files Added to the UCD