Unicode® 11.0.0 (DRAFT)
This page summarizes the important changes for the Unicode Standard, Version 11.0.0.
This version supersedes all previous versions of the Unicode Standard.
|The Unicode Character Database, Code Charts, and Annexes for Version 11.0
will be released on
June 5, 2018. The core specification (the PDF chapters) of Version 11.0 is still pending publication due to the extensive editorial work required for the new content additions. Until final publication, the links to individual chapters of the core specification will not be activated. An announcement will be made when the core specification for Version 11.0 is available. In the meantime, implementers can continue to reference the relevant sections of the most recent version of the core specification.
B. Technical Overview
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Changes in the Unicode Character Database
G. Changes in the Unicode Standard Annexes
H. Changes in Synchronized Unicode Technical Standards
M. Implications for Migration
Unicode 11.0 adds 684 characters, for a total of 137,374 characters.
These additions include 7 new scripts,
for a total of 146 scripts, as well as 66 new emoji characters.
The new scripts and characters in Version 11.0 add support for lesser-used languages and unique written requirements worldwide. Funds from the Adopt-a-Character program provided support for some of these additions. The new scripts and characters include:
- Dogra, used to write historic Dogra in South Asia
- Georgian Mtavruli capital letters, newly added to support modern casing practices
- Gunjala Gondi, used to write the Adilabad dialect of the Gondi language in South Asia
- Hanifi Rohingya, used to write the modern Rohingya language in Southeast Asia
- Makasar, used to write historic Makasar in Indonesia
- Medefaidrin, used for modern liturgical purposes in Africa
- Old Sogdian, used to write historic Sogdian in the third to fifth centuries in Central Asia
- Sogdian, used to write historic languages in the seventh to fourteenth centuries in Central Asia
- Five urgently needed CJK unified ideographs: three for newly standardized names of chemical elements, and two for Japan's government administration Moji Joho Kiban Project that includes ideographs for personal and place names
Popular symbol additions:
- 66 emoji characters, including 4 new emoji components for hair color.
For complete statistics regarding all emoji as of Unicode 11.0, see Emoji Counts.
[Caution: Those counts may not yet be final
until the publication of the approved UTS #51, Version 11.0]
For more information about emoji additions for Unicode 11.0, including
new emoji ZWJ sequences and emoji modifier sequences, see Emoji Recently Added, v11.0.
- Copyleft symbol
- Half stars for rating systems
- Additional astrological symbols
- Xiangqi Chinese chess symbols
Additional support for lesser-used languages and scholarly work was extended worldwide, including:
- For the Mazahua language, a Mesoamerican language recognized by law in Mexico
- For Mayan numerals used in printed materials in Central America
- For Sanskrit manuscripts written in Bengali
- For Gurmukhi manuscripts
- For historic documents of the Buryats of the Barguzin Steppe
- For Japanese linguistic studies
Version 11.0 improved segmentation support by adding explicit consideration of the handling of Indic script viramas to the processing for extended grapheme clusters. The statements of emoji-related rules for grapheme cluster boundaries and for word boundaries have also been simplified.
Several other important Unicode specifications have been updated for Version 11.0.
The following four Unicode Technical Standards are versioned in
synchrony with the Unicode Standard, because their data files cover the same repertoire.
All have been updated to Version 11.0:
Some of the changes in Version 11.0 and associated Unicode Technical Standards
may require modifications
to implementations. For more information, see the migration and modification sections of UTS #10, UTS #39, UTS #46, and UTS #51.
This version of the Unicode Standard is also synchronized with 10646:2017, fifth edition,
plus Amendment 1 to the fifth edition,
plus the following additions from Amendment 2 to the fifth edition:
- 46 Mtavruli Georgian capital letters
- 5 urgently needed CJK unified ideographs
- 66 emoji characters
See Sections D through H below for additional details regarding the changes in this version of
the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.
Version 11.0 of the Unicode Standard consists of:
- The core specification
- The code charts (delta and archival) for this version
- The Unicode Standard Annexes
- The Unicode Character Database (UCD)
The core specification gives the general principles,
requirements for conformance, and guidelines for implementers. The
code charts show representative glyphs for all the Unicode
characters. The Unicode Standard Annexes supply detailed normative
information about particular aspects of the standard. The Unicode
Character Database supplies normative and informative data for
implementers to allow them to implement the Unicode Standard.
The core specification is available as
a single pdf for viewing.
Links are also available
in the navigation bar on the left of this page to access
individual chapters and appendices
of the core specification.
Several sets of code charts are available. They serve different
- The latest set of code charts for the Unicode Standard is available online. Those charts are always the most current code charts available, and may be updated at any time. The charts are organized by scripts and blocks for easy reference. An online index by character name is also provided.
For Unicode 11.0.0 in particular two additional sets of code chart pages are provided:
- A set of delta code charts showing the
new blocks and any blocks in which characters were added for Unicode 11.0.0. The new characters are visually highlighted in the charts.
- A set of archival code charts that represents
the entire set of characters, names and representative glyphs at the time of publication of Unicode 11.0.0.
The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.
Links to the individual
Unicode Standard Annexes are available in
the navigation bar on the left of this page. The list of significant changes
in the content of the Unicode Standard Annexes for Version 11.0 can be found
in Section G below.
Data files for Version 11.0 of
the Unicode Character Database are available. The ReadMe.txt in that directory provides a roadmap
to the functions of the various subdirectories.
Zipped versions of the UCD
for bulk download are available, as well.
Version 11.0.0 of the Unicode Standard
should be referenced as:
The Unicode Consortium. The Unicode Standard, Version 11.0.0, (Mountain View, CA: The Unicode Consortium,
2017. ISBN 978-1-936213-19-1)
The terms “Version 11.0” or “Unicode 11.0” are abbreviations for the full version reference, Version 11.0.0.
The citation and permalink for the latest published version of the Unicode Standard is:
The Unicode Consortium. The Unicode Standard.
A complete specification of the contributory files for Unicode
11.0 is found on the page Components for 11.0.0.
That page also provides the recommended reference format for Unicode Standard Annexes. For examples of how to cite particular portions of the Unicode Standard, see also the Reference Examples.
Errata incorporated into Unicode 11.0 are listed by date in
a separate table. For corrigenda and errata after the release of Unicode 11.0, see the list of current
Updates and Errata.
There were no significant changes to the Stability Policy of the core specification between Unicode 10.0 and Unicode 11.0.
scripts were added with accompanying new block descriptions:
Changes in the Unicode Standard Annexes are listed in Section G.
Character Assignment Overview
684 characters have been added.
Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, see
delta code charts.
The detailed listing of all changes to the contributory data files of the Unicode Character Database
for Version 11.0 can be found in
UAX #44, Unicode Character Database.
The changes listed there include character additions and property revisions to existing characters that will affect implementations.
Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in
In Version 11.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section
of each UAX, linked directly from the following list of UAXes.
|Unicode Standard Annex
Unicode Bidirectional Algorithm
|Clarified the explanation of how paragraph separators are handled in X8.
East Asian Width
|Added a note in Section 2 that the East_Asian_Width property property was never intended to be used by modern terminal emulators, especially with Unicode's current repertoire.
Unicode Line Breaking Algorithm
|Updated Rule LB8a to handle the same set of pictographic symbols in the line breaking of emoji zwj sequences as is used for text segmentation in UAX #29.
Unicode Normalization Forms
|Section 5, Composition Exclusion was rewritten for clarity and correctness.
Unicode Script Property
|No significant changes in this version.
Unicode Text Segmentation
|Added support for Indic virama handling to extended grapheme clusters. Added use of the Extended_Pictographic property from Emoji 11.0, to simplify the statement of emoji-related rules for grapheme cluster boundaries and word boundaries. Added a table of formal regex definitions to rationalize the definition of the classes used for grapheme cluster boundaries.
Unicode Identifier and Pattern Syntax
|Refined the use of ZWJ in identifiers (adding some restrictions and relaxing others slightly), added the new scripts for Version 11.0, and broadened the definition of hashtag identifiers.
Unicode Named Character Sequences
|No significant changes in this version.
Unicode Han Database (Unihan)
|Added five fields and improved regular expressions. Documented extension of Unihan properties to non-Unihan characters.
Common References for Unicode Standard Annexes
|Updated all references for Unicode 11.0.
Unicode Character Database in XML
|Added new code point attributes, values, and patterns.
Unicode Character Database
|Added new property Equivalent_Unified_Ideograph to the property table. Added regular expressions for the validation of Bidi_Paired_Bracket and Equivalent_Unified_Ideograph to Table 21. Updated the discussion of emoji variation sequences. Provided further clarification about the range of numeric values allowed for the Age property.
|Improved documentation for identifier prefixes.
Unicode Vertical Text Layout
|Section 4, Tailorings, was removed, because its content was no longer useful.
There are also significant revisions in the Unicode Technical Standards whose
versions are synchronized with the Unicode Standard. The most important of these changes are listed below.
For the full details of all changes, see the Modifications section
of each UTS, linked directly from the following list of UTSes.
|Unicode Technical Standard
Unicode Collation Algorithm
|A clarification was added regarding search tailoring in scripts which use the visual order model. The DUCET table was updated to cover the Unicode 11.0 repertoire
Unicode Security Mechanisms
|Added further discussion about the use of joining controls, including how advanced implementations may use script-specific information to determine behavior. Also refined the suggestions about checking certain kinds of combining sequences in spoof detection.
Unicode IDNA Compatibility Processing
|Changed the format of the test file to permit testing with different, arbitrary combinations of the input settings. The format of the input setting for Transitional_Processing was updated. And the table of IDNA Comparisions was updated to reflect Unicode 11.0 character additions.
|The versioning of UTS #51 was bumped to 11.0, so that it now matches the Unicode version number associated with the latest emoji delta release. The Extended_Pictographic property for emoji was added and documented, to enable a more compact description of the behavior of emoji in segmentation algorithms. An emoji ZWJ sequence mechanism was added for hinting at glyph facing direction for some emoji. Documentation was added regarding the use of the four new hair emoji components. A discussion was added regarding the use of gender neutral emoji.
There are a significant number of changes in Unicode 11.0 which may impact implementations which are upgrading to Version 11.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.
Version 11.0 adds 7 new scripts, so implementations which process script data
should be carefully checked. Some of these scripts have particular attributes
which may cause issues for implementations.
Standardized Variation Sequences
New Data Files Added to the UCD
There are numerous changes in the representative glyphs, some backed by
There are also glyph changes in the text presentation of a number of emoji and emoticons.
Some of those changes reflect an attempt to make the text presentation glyphs for
emoji converge on common practice among vendors for the emoji presentation glyphs.
Such glyph changes are highlighted in violet in the
delta code charts for Version 11.0.