Submitting Character Proposals
The Unicode Consortium accepts proposals for inclusion of new
characters and scripts in the Unicode Standard. Those considering
submitting a proposal should first determine whether or not a
particular script or character has already been proposed.
Please see the Proposed New
Characters -- Pipeline Table page for
information on additions to the Unicode Standard which are already
under consideration. General guidelines for the preparation of a proposal appear below.
The Unicode Standard definition of character is stated in the
Glossary of Unicode
Terms. Before preparing a proposal, sponsors should note in
particular the distinction between the terms character and glyph as
therein defined. Because of this distinction, graphics such as
ligatures, conjunct consonants, minor variant written forms, or
abbreviations of longer forms are generally not acceptable as
Unicode characters. Also see Where
is my Character?
The sponsor(s) proposing the addition of a new character to the
Unicode Standard should follow these guidelines.
Proposals for new emoji need to meet different criteria, however. To propose new emoji, follow the Guidelines for Submitting Unicode Emoji Proposals instead of the rest of this section.
Before proceeding, determine that each proposed addition is a
character according to the definition given in the Unicode
Standard and that the proposed addition does not already exist in
the Standard. Consult the
Proposed New Characters page to see if the character is
already on track to be encoded, and the
Archive of
Nonapproval Notices to see if the character has already been
considered but was disapproved for some reason.
Often a proposed character can be expressed as a sequence of
one or more existing Unicode characters. Encoding the proposed
character would be a duplicate representation, and is thus not
suitable for encoding. (In any event, the proposed character would
disappear when normalized.) For example, a g-umlaut character is
not suitable for encoding, since it can already be expressed with
the sequence <g, combining diaeresis>. For further information on
such sequences see
Where is my
Character and the FAQ page
Characters, Combining Marks.
Ensure that documentation supporting the proposal states
whether any Unicode characters were examined as possible
equivalents for the proposed character and, if so, why each was
rejected. Consult the
Unicode
Character Encoding Stability
Policy to make sure that any associated change to
existing characters is in accordance with Consortium policies.
Determine and list the proposed (or recommended) character
properties for each character being proposed, especially when
proposing entire scripts for encoding. See the
Unicode Properties in Character Proposals
for guidelines about character properties and a list of
questions to help make determinations about appropriate property
values. See also Chapter 4, Character Properties of
The Unicode Standard. Even a partial list of properties will be helpful
in the initial proposal.
Proposals to include entire scripts (Egyptian hieroglyphics,
for example) must cite modern, definitive sources of information
regarding such scripts. Sponsorship by the relevant academic
bodies (such as The International Association of Egyptologists)
may be helpful in determining the proper scope for encoding of
characters in such cases. Before submitting full script
proposals, sponsors should also determine that a proposal does not
already exist for that script, for example by consulting the Roadmaps.
If a proposed character is part of a dead language or
obsolete/rare script that is already encoded, cite the most
important modern sources of information on the script and the
proposed additions. Names, including academic affiliation, of
researchers in the relevant field are welcomed.
If the proposed characters exhibit shaping behavior
(contextual shaping, ligatures, conjuncts, or stacking), provide a
description of that behavior, preferably with glyph examples. It
should be sufficient so that software engineers can produce a
minimally acceptable rendering of the characters.
If the proposed characters are symbols, consult the
Criteria for Encoding Symbols to gain familiarity
with some of the criteria that the UTC will consider when
determining whether new symbols are appropriate for
encoding. Research other already-encoded blocks of symbols
in the standard to check that the types of symbols in the
proposal have precedents. Also, because symbols often vary
widely in appearance, check carefully that the symbol(s) in
the proposal are not merely font-specific variant shapes of
symbols already encoded in the standard.
Information about the sorting order of proposed characters
should also be provided, where known. For general information
about sorting, see Collation. In particular,
consider the
UCA Default Table Criteria
for New Characters,
which specifies the criteria the UTC uses for making initial
determinations about collation weights for newly encoded
characters.
The Unicode Consortium works closely with the relevant
committee responsible for ISO/IEC 10646, namely JTC1/SC2/WG2, in
proposing additions as well as monitoring the status of proposals
by various national bodies. Therefore, proposals may eventually be
formulated as ISO/IEC documents and significant detailed
information will be required.
The standardized form "ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL
SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE
REPERTOIRE OF ISO/IEC 10646" has been designed for the purpose of
obtaining detailed information for ISO purposes and for the
Unicode Technical Committee. Use of this form is required for all
proposals. It is available at the following URL:
https://www.unicode.org/L2/summary.html
To complete the Proposal Summary Form, sponsors may wish to
refer to the WG2 Principles and Procedures document, also accessible
from that URL. That document contains context and explanations
about the various questions on the Proposal Summary Form.
Before "finally approving" additions, we require a font with an
appropriate license for printing the standard (see
Font
Submissions Policy). Even if approved,
additions won't be published in a version of the standard unless
suitable fonts are available.
A Contributor License Agreement is Required
The Unicode Consortium’s mission is to enable people around the world to use computers in any language. In furtherance of this mission, the Consortium makes its standards, specifications, software, and data freely available to all users around the world under its Unicode Terms of Use and various highly permissive open-source licenses. In order to make its products freely available in this manner, the Consortium needs permission from contributors to freely use, modify, and distribute their contributions as part of the Consortium’s products.
The Consortium has adopted a standard Contributor License Agreement (CLA) for this purpose. The Unicode CLA ensures that a contributor retains ownership of any intellectual property rights in their contribution while granting the Unicode Consortium the necessary legal rights to use, modify, and distribute that contribution in Consortium products. Unicode CLAs are based on the Apache Software Foundation's CLAs, which are well-known in the industry and widely adopted by many respected open source projects.
For further information, please see the Unicode Consortium Intellectual Property, Licensing & Technical Contribution Policies.
Who needs to sign a Unicode CLA for Script & Character Proposals?
There are two categories of contributors who need to sign a Unicode CLA for script and character proposals.
The primary category is all “authors” of the proposal. “Authors” include people who draft or otherwise prepare any significant portion of the proposal, including any data compilations, charts, or other exhibits or appendices. In this context, proposals may have multiple authors, and all authors, not just the primary author, are required to sign a Unicode CLA.
Please note that authors should not be confused with sponsors of a proposal. A person or entity or national body may join a proposal or sponsor it without being an author - “authors” are limited to those who draft or otherwise prepare any significant portion of the proposal.
Important: Proposals will not be considered and will not be eligible for posting to the Document Register unless and until a CLA is in place for all authors of the proposal.
The second category of contributors who need to sign a CLA are any persons or entities (other than the authors of the proposal) who have, may have, or claim intellectual property rights in the proposed character or script itself. This is an unusual scenario. Please see below for further information regarding the Consortium’s requirements in such circumstances.
How to Sign a Unicode CLA?
Briefly, each proposal author will need to determine whether they need to sign an Individual CLA or a Corporate CLA, depending on who owns the contribution being made, the contributor personally or the contributor’s employer or some other corporate entity. It is the contributor’s responsibility to do the research necessary to make this determination.
In the case of a personal contribution not owned by any corporate entity, the contributing individual should sign the Unicode Individual CLA either electronically in GitHub (Unicode CLA Form) or in PDF format. Signing electronically in GitHub is strongly preferred and assists the Consortium in record-keeping.
In the case of a contribution owned by the contributor’s corporate employer or some other corporate entity, then the Corporate CLA is required. Corporate CLAs cannot be signed in GitHub and must be signed in PDF format and submitted to member-services@unicode.org. To check to see if the Consortium already has a signed Corporate CLA on file for a particular company or other entity, please see the Public List of Corporate CLAs.
For further, more detailed instructions on how to sign a Unicode CLA in GitHub or in PDF format, please see How to Sign a Unicode CLA in the Unicode Consortium Intellectual Property, Licensing & Technical Contribution Policies. If you have questions, please contact member-services@unicode.org.
Once a contributor, whether individual or corporate, has signed a Unicode CLA, they may continue to make additional contributions to the Unicode Consortium indefinitely without having to sign a CLA for each separate contribution.
IP Claimants in Scripts & Characters
As noted above, if there are any persons or entities who have, may have, or claim intellectual property rights (copyright, design, or patent rights) in a proposed character or script itself, then the Consortium requires two things of all such IP owners/claimants: (i) that they sign a Unicode CLA or other appropriate license agreement, and (ii) that they provide a formal written endorsement of the proposal. For instructions on how to sign a standard Unicode CLA, please see above, as well as the Unicode Consortium Intellectual Property, Licensing & Technical Contribution Policies. To provide the required written endorsement from an IP owner/claimant in the proposed characters/scripts, please send an email to script-proposals@unicode.org from an email account that is identifiable as that of the IP owner/claimant and provide the endorsement of the proposal, clearly identifying the proposal by name, date, and author(s).
This will be an unusual scenario – the vast majority of scripts and characters that are in scope for encoding in the Unicode Standard are generally not subject to intellectual property protection for a variety of reasons. However, “fictional” languages/scripts, such as Elvish from Lord of the Rings, may be subject to copyright protection depending on the particular circumstances and jurisdiction. Additionally, there are some scripts (whether fictional or not) in which the script creator expressly claims copyright or other IP rights and/or has registered such rights.
Whether “fictional” and “created” languages/scripts are in fact subject to intellectual property protection is disputed by some and is not an area of well-settled law around the world. The Consortium acknowledges that there is no clear consensus on these questions in every jurisdiction. Nevertheless, in the interests of making Unicode standards, specifications, data, and software as widely and freely available as possible, it is Consortium policy that a CLA or similar license is required in these cases.
Proposers are required to identify any such potential IP owners or claimants in their proposals and should obtain the formal endorsement of such owners/claimants. The Consortium will not consider proposals that are not endorsed in writing by all IP claimants in the proposed characters/scripts. The Consortium does not have the resources to research and vet prior IP rights, and in cases where a proposal is not endorsed in writing by all IP claimants, and/or fails to provide sufficient information regarding IP rights/claims, the Consortium will have little choice but to decline to encode.
When potential IP owners in the script/characters are identified, the Consortium will need to review the circumstances and consider whether a standard CLA or other similar license best meets the needs for encoding. Proposers and IP claimants should provide as much information as possible about claimed IP rights in such cases to facilitate the Consortium’s review and to increase the chances that the Consortium will be able to encode.
The submission must include all of the following information:
1. The completed Proposal Summary Form, which (in summary) requires the following:
- the repertoire, including proposed character names;
- the name and contact information for the company or individual who will provide a computerized font (True Type or PostScript) for publication of the standard (see, Font Submission Policy for further information regarding font requirements);
- references to dictionaries and descriptive texts establishing authoritative information;
- names and addresses of appropriate contacts within national body or user organizations;
- the context within which the proposed characters are used (for example, current, historical, and so on);
- especially for sporadic additions, what similarities or relationships the proposed characters bear to existing characters already encoded in the standard.
2. All additional relevant information for your proposal as described above in the Proposal Guidelines.
3. All information required by the above Legal & Licensing Requirements, namely,
- names and contact information (country of residence, email address, and website address (if one exists)) for all of the following:
- all proposal authors;
- all proposal sponsors/endorsers;
- any individual or entity who may own or claim intellectual property rights (copyright, design, or patent rights) in the proposed scripts/characters themselves, and all available information regarding the nature and extent of such IP rights - the more information you can provide in this regard, the better); and
- for each proposal author, an affirmation that
- the author has signed an Individual CLA, identifying whether the CLA was signed in Github or in PDF; or
- the author’s contribution is owned by their employer and is covered by their employer’s existing Corporate CLA on file with the Consortium, identifying the employer by company name so that the existence of the Corporate CLA can be verified. (If you wish to check whether your employer has signed a Unicode CLA, see the Public Unicode Corporate CLA List.)
The foregoing legal information should not be provided in the proposal documentation that will (if acceptable) be posted in the UTC Document Registry, but rather only in the body of the email you send to script-proposals@unicode.org forwarding the proposal documents. Because proposal documentation may be publicly posted in the UTC Document Registry when the proposal is formally forwarded to the UTC for consideration, personally-identifiable and other legal information should not be included in the proposal documentation, but rather only in the email sending such documentation. If you do include such information in the proposal documentation, you consent to its publication in the UTC Document Registry.
All proposals (whether or not successful) and related materials will be retained by the Unicode Consortium as a matter of record and may be used for any legitimate Consortium purpose subject to the Unicode Consortium Intellectual Property, Licensing & Technical Contribution Policies.
The international standardization of entire scripts requires a
significant effort on the sponsor's part. It frequently takes years
to move from an initial draft to final standardization, particularly
because of the requirements to synchronize proposals with the work
done in the ISO committee responsible for the development of ISO/IEC
10646.
Experience has shown that it is often helpful to discuss preliminary proposals
before submitting a detailed proposal. One option is to
become a member of the
Unicode Consortium, and submit the proposal to the members-only email list. Alternatively, sponsors can contact the UC Berkeley’s
Script Encoding
Initiative for initial review.
Each proposal received will be evaluated initially by technical
officers of the Unicode Consortium and the result of this initial evaluation
will be communicated to the sponsor(s) of the proposal. Once a
proposal passes this initial screening, it will be reviewed by the
Unicode
Technical Committee.
Sponsors, particularly of entire scripts, should
be prepared to become involved at various times throughout the
process -- perhaps revising their proposals more than once;
collecting further detailed information; organizing on-line
discussions or meetings to dispel controversy; or answering
questions posed by committees or national bodies. Without such
involvement, any proposal of more than a few characters is unlikely
to be successful in the long-run.
Sponsors can monitor the further progress of their proposals via
the
public
UTC minutes as well as the Proposed New
Characters -- Pipeline Table page.
Many good proposals can be found in the
UTC document
register.
Thesaurus Linguae
Graecae has prepared a number of successful proposals.
For people interested in proposing a single symbol or a small
set of symbols for encoding, there are also many successful
proposals in the UTC document register. For example see the
proposal for power symbols.
There are ways for programmers and scholarly organizations to
make use of Unicode character encoding, even if the script they want
to use or transmit is not yet (or may never be) part of the Unicode
Standard. Individual groups that make use of rare scripts or special
characters can reach a private agreement about interchange and set
aside part of the Private Use Area to encode their private set of
characters. Individuals with interests in rare scripts or
materials relating to them may sometimes be contacted through an
electronic mail list which the Consortium maintains. For
information about these mail lists, please
contact the
Unicode office.
To send completed proposals or to make further inquiries, please
see the
Document Submission Details page.
All proposals are required to be in one of the following forms:
- PDF format (preferred)
- HTML along with any needed GIF or JPEG images (a ZIP file or
TAR archive should be made, including all of the required files)