Submitting Character Proposals
The Unicode Consortium accepts proposals for inclusion of new
characters and scripts in the Unicode Standard. Those considering
submitting a proposal should first determine whether or not a
particular script or character has already been proposed.
Please see the Proposed New
Characters -- Pipeline Table page for
information on additions to the Unicode Standard which are already
under consideration. General guidelines for the preparation of a proposal appear below.
The Unicode Standard definition of character is stated in the
Glossary of Unicode
Terms. Before preparing a proposal, sponsors should note in
particular the distinction between the terms character and glyph as
therein defined. Because of this distinction, graphics such as
ligatures, conjunct consonants, minor variant written forms, or
abbreviations of longer forms are generally not acceptable as
Unicode characters. Also see Where
is my Character?
The sponsor(s) proposing the addition of a new character to the
Unicode Standard should follow these guidelines.
Proposals for new emoji need to meet different criteria, however. To propose new emoji, follow the Guidelines for Submitting Unicode Emoji Proposals instead of the rest of this section.
Before proceeding, determine that each proposed addition is a
character according to the definition given in the Unicode
Standard and that the proposed addition does not already exist in
the Standard. Consult the
Proposed New Characters page to see if the character is
already on track to be encoded, and the
Archive of
Nonapproval Notices to see if the character has already been
considered but was disapproved for some reason.
Often a proposed character can be expressed as a sequence of
one or more existing Unicode characters. Encoding the proposed
character would be a duplicate representation, and is thus not
suitable for encoding. (In any event, the proposed character would
disappear when normalized.) For example, a g-umlaut character is
not suitable for encoding, since it can already be expressed with
the sequence <g, combining diaeresis>. For further information on
such sequences see
Where is my
Character and the FAQ page
Characters, Combining Marks.
Ensure that documentation supporting the proposal states
whether any Unicode characters were examined as possible
equivalents for the proposed character and, if so, why each was
rejected. Consult the
Unicode
Character Encoding Stability
Policy to make sure that any associated change to
existing characters is in accordance with Consortium policies.
Determine and list the proposed (or recommended) character
properties for each character being proposed, especially when
proposing entire scripts for encoding. See the
Unicode Properties in Character Proposals
for guidelines about character properties and a list of
questions to help make determinations about appropriate property
values. See also Chapter 4, Character Properties of
The Unicode Standard. Even a partial list of properties will be helpful
in the initial proposal.
Proposals to include entire scripts (Egyptian hieroglyphics,
for example) must cite modern, definitive sources of information
regarding such scripts. Sponsorship by the relevant academic
bodies (such as The International Association of Egyptologists)
may be helpful in determining the proper scope for encoding of
characters in such cases. Before submitting full script
proposals, sponsors should also determine that a proposal does not
already exist for that script, for example by consulting the Roadmaps.
If a proposed character is part of a dead language or
obsolete/rare script that is already encoded, cite the most
important modern sources of information on the script and the
proposed additions. Names, including academic affiliation, of
researchers in the relevant field are welcomed.
If the proposed characters exhibit shaping behavior
(contextual shaping, ligatures, conjuncts, or stacking), provide a
description of that behavior, preferably with glyph examples. It
should be sufficient so that software engineers can produce a
minimally acceptable rendering of the characters.
If the proposed characters are symbols, consult the
Criteria for Encoding Symbols to gain familiarity
with some of the criteria that the UTC will consider when
determining whether new symbols are appropriate for
encoding. Research other already-encoded blocks of symbols
in the standard to check that the types of symbols in the
proposal have precedents. Also, because symbols often vary
widely in appearance, check carefully that the symbol(s) in
the proposal are not merely font-specific variant shapes of
symbols already encoded in the standard.
Information about the sorting order of proposed characters
should also be provided, where known. For general information
about sorting, see Collation. In particular,
consider the
UCA Default Table Criteria
for New Characters,
which specifies the criteria the UTC uses for making initial
determinations about collation weights for newly encoded
characters.
The Unicode Consortium works closely with the relevant
committee responsible for ISO/IEC 10646, namely JTC1/SC2/WG2, in
proposing additions as well as monitoring the status of proposals
by various national bodies. Therefore, proposals may eventually be
formulated as ISO/IEC documents and significant detailed
information will be required.
The standardized form "ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL
SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE
REPERTOIRE OF ISO/IEC 10646" has been designed for the purpose of
obtaining detailed information for ISO purposes and for the
Unicode Technical Committee. Use of this form is required for all
proposals. It is available at the following URL:
https://www.unicode.org/L2/summary.html
To complete the Proposal Summary Form, sponsors may wish to
refer to the WG2 Principles and Procedures document, also accessible
from that URL. That document contains context and explanations
about the various questions on the Proposal Summary Form.
Before "finally approving" additions, we require a font with an
appropriate license for printing the standard (see
Font
Submissions Policy). Even if approved,
additions won't be published in a version of the standard unless
suitable fonts are available.
The proposal summary form requires the following information
(paraphrased):
- the repertoire, including proposed character names;
- the name and contact information for a company or individual
who would agree to provide a computerized font (True Type or
PostScript) for publication of the standard;
- references to dictionaries and descriptive texts establishing
authoritative information;
- names and addresses of appropriate contacts within national
body or user organizations;
- the context within which the proposed characters are used (for
example, current, historical, and so on);
- especially for sporadic additions, what similarities or
relationships the proposed characters bear to existing characters
already encoded in the standard.
All proposals (whether successful or not) and related materials
will be retained by the Unicode Consortium as a matter of record and
may be used for any purpose.
The international standardization of entire scripts requires a
significant effort on the sponsor's part. It frequently takes years
to move from an initial draft to final standardization, particularly
because of the requirements to synchronize proposals with the work
done in the ISO committee responsible for the development of ISO/IEC
10646.
Experience has shown that it is often helpful to discuss preliminary proposals
before submitting a detailed proposal. One option is to
become a member of the
Unicode Consortium, and submit the proposal to the members-only email list. Alternatively, sponsors can contact the UC Berkeley’s
Script Encoding
Initiative for initial review.
Each proposal received will be evaluated initially by technical
officers of the Unicode Consortium and the result of this initial evaluation
will be communicated to the sponsor(s) of the proposal. Once a
proposal passes this initial screening, it will be reviewed by the
Unicode
Technical Committee.
Sponsors, particularly of entire scripts, should
be prepared to become involved at various times throughout the
process -- perhaps revising their proposals more than once;
collecting further detailed information; organizing on-line
discussions or meetings to dispel controversy; or answering
questions posed by committees or national bodies. Without such
involvement, any proposal of more than a few characters is unlikely
to be successful in the long-run.
Sponsors can monitor the further progress of their proposals via
the
public
UTC minutes as well as the Proposed New
Characters -- Pipeline Table page.
Many good proposals can be found in the
UTC document
register.
Thesaurus Linguae
Graecae has prepared a number of successful proposals.
For people interested in proposing a single symbol or a small
set of symbols for encoding, there are also many successful
proposals in the UTC document register. For example see the
proposal for power symbols.
There are ways for programmers and scholarly organizations to
make use of Unicode character encoding, even if the script they want
to use or transmit is not yet (or may never be) part of the Unicode
Standard. Individual groups that make use of rare scripts or special
characters can reach a private agreement about interchange and set
aside part of the Private Use Area to encode their private set of
characters. Individuals with interests in rare scripts or
materials relating to them may sometimes be contacted through an
electronic mail list which the Consortium maintains. For
information about these mail lists, please
contact the
Unicode office.
To send completed proposals or to make further inquiries, please
see the
Document Submission Details page.
All proposals are required to be in one of the following forms:
- PDF format (preferred)
- HTML along with any needed GIF or JPEG images (a ZIP file or
TAR archive should be made, including all of the required files)