[Unicode] Character Proposals Tech Site | Site Map | Search
 

Submitting Character Proposals

General Information

The Unicode Consortium accepts proposals for inclusion of new characters and scripts in the Unicode Standard. Those considering submitting a proposal should first determine whether or not a particular script or character has already been proposed.  Please see the Proposed New Characters -- Pipeline Table page for information on additions to the Unicode Standard which are already under consideration. General guidelines for the preparation of a proposal appear below.

The Unicode Standard definition of character is stated in the Glossary of Unicode Terms. Before preparing a proposal, sponsors should note in particular the distinction between the terms character and glyph as therein defined. Because of this distinction, graphics such as ligatures, conjunct consonants, minor variant written forms, or abbreviations of longer forms are generally not acceptable as Unicode characters. Also see Where is my Character?

Proposal Guidelines

The sponsor(s) proposing the addition of a new character to the Unicode Standard should follow these guidelines.

Proposals for new emoji need to meet different criteria, however. To propose new emoji, follow the Guidelines for Submitting Unicode Emoji Proposals instead of the rest of this section.

Before proceeding, determine that each proposed addition is a character according to the definition given in the Unicode Standard and that the proposed addition does not already exist in the Standard. Consult the Proposed New Characters page to see if the character is already on track to be encoded, and the Archive of Nonapproval Notices to see if the character has already been considered but was disapproved for some reason.

Often a proposed character can be expressed as a sequence of one or more existing Unicode characters. Encoding the proposed character would be a duplicate representation, and is thus not suitable for encoding. (In any event, the proposed character would disappear when normalized.) For example, a g-umlaut character is not suitable for encoding, since it can already be expressed with the sequence <g, combining diaeresis>. For further information on such sequences see Where is my Character and the FAQ page Characters, Combining Marks.

Ensure that documentation supporting the proposal states whether any Unicode characters were examined as possible equivalents for the proposed character and, if so, why each was rejected. Consult the Unicode Character Encoding Stability Policy to make sure that any associated change to existing characters is in accordance with Consortium policies.

Determine and list the proposed (or recommended) character properties for each character being proposed, especially when proposing entire scripts for encoding. See the Unicode Properties in Character Proposals for guidelines about character properties and a list of questions to help make determinations about appropriate property values. See also Chapter 4, Character Properties of The Unicode Standard. Even a partial list of properties will be helpful in the initial proposal.

Proposals to include entire scripts (Egyptian hieroglyphics, for example) must cite modern, definitive sources of information regarding such scripts. Sponsorship by the relevant academic bodies (such as The International Association of Egyptologists) may be helpful in determining the proper scope for encoding of characters in such cases. Before submitting full script proposals, sponsors should also determine that a proposal does not already exist for that script, for example by consulting the Roadmaps.

If a proposed character is part of a dead language or obsolete/rare script that is already encoded, cite the most important modern sources of information on the script and the proposed additions. Names, including academic affiliation, of researchers in the relevant field are welcomed.

If the proposed characters exhibit shaping behavior (contextual shaping, ligatures, conjuncts, or stacking), provide a description of that behavior, preferably with glyph examples. It should be sufficient so that software engineers can produce a minimally acceptable rendering of the characters.

If the proposed characters are symbols, consult the Criteria for Encoding Symbols to gain familiarity with some of the criteria that the UTC will consider when determining whether new symbols are appropriate for encoding. Research other already-encoded blocks of symbols in the standard to check that the types of symbols in the proposal have precedents. Also, because symbols often vary widely in appearance, check carefully that the symbol(s) in the proposal are not merely font-specific variant shapes of symbols already encoded in the standard.

Information about the sorting order of proposed characters should also be provided, where known. For general information about sorting, see Collation. In particular, consider the UCA Default Table Criteria for New Characters, which specifies the criteria the UTC uses for making initial determinations about collation weights for newly encoded characters.

The Unicode Consortium works closely with the relevant committee responsible for ISO/IEC 10646, namely JTC1/SC2/WG2, in proposing additions as well as monitoring the status of proposals by various national bodies. Therefore, proposals may eventually be formulated as ISO/IEC documents and significant detailed information will be required.

The standardized form "ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646" has been designed for the purpose of obtaining detailed information for ISO purposes and for the Unicode Technical Committee. Use of this form is required for all proposals. It is available at the following URL:

https://www.unicode.org/L2/summary.html

To complete the Proposal Summary Form, sponsors may wish to refer to the WG2 Principles and Procedures document, also accessible from that URL. That document contains context and explanations about the various questions on the Proposal Summary Form.

Before "finally approving" additions, we require a font with an appropriate license for printing the standard (see Font Submissions Policy). Even if approved, additions won't be published in a version of the standard unless suitable fonts are available.

Legal & Licensing Requirements for Script & Character Proposals

A Contributor License Agreement is Required

The Unicode Consortium’s mission is to enable people around the world to use computers in any language. In furtherance of this mission, the Consortium makes its standards, specifications, software, and data freely available to all users around the world under its Unicode Terms of Use and various highly permissive open-source licenses. In order to make its products freely available in this manner, the Consortium needs permission from contributors to freely use, modify, and distribute their contributions as part of the Consortium’s products.

The Consortium has adopted a standard Contributor License Agreement (CLA) for this purpose. The Unicode CLA ensures that a contributor retains ownership of any intellectual property rights in their contribution while granting the Unicode Consortium the necessary legal rights to use, modify, and distribute that contribution in Consortium products. Unicode CLAs are based on the Apache Software Foundation's CLAs, which are well-known in the industry and widely adopted by many respected open source projects.

For further information, please see the Unicode Consortium Intellectual Property, Licensing & Technical Contribution Policies.

Who needs to sign a Unicode CLA for Script & Character Proposals?

There are two categories of contributors who need to sign a Unicode CLA for script and character proposals.

The primary category is all “authors” of the proposal. “Authors” include people who draft or otherwise prepare any significant portion of the proposal, including any data compilations, charts, or other exhibits or appendices. In this context, proposals may have multiple authors, and all authors, not just the primary author, are required to sign a Unicode CLA.

Please note that authors should not be confused with sponsors of a proposal. A person or entity or national body may join a proposal or sponsor it without being an author - “authors” are limited to those who draft or otherwise prepare any significant portion of the proposal.

Important: Proposals will not be considered and will not be eligible for posting to the Document Register unless and until a CLA is in place for all authors of the proposal.

The second category of contributors who need to sign a CLA are any persons or entities (other than the authors of the proposal) who have, may have, or claim intellectual property rights in the proposed character or script itself. This is an unusual scenario. Please see below for further information regarding the Consortium’s requirements in such circumstances.

How to Sign a Unicode CLA?

Briefly, each proposal author will need to determine whether they need to sign an Individual CLA or a Corporate CLA, depending on who owns the contribution being made, the contributor personally or the contributor’s employer or some other corporate entity. It is the contributor’s responsibility to do the research necessary to make this determination.

In the case of a personal contribution not owned by any corporate entity, the contributing individual should sign the Unicode Individual CLA either electronically in GitHub (Unicode CLA Form) or in PDF format. Signing electronically in GitHub is strongly preferred and assists the Consortium in record-keeping.

In the case of a contribution owned by the contributor’s corporate employer or some other corporate entity, then the Corporate CLA is required. Corporate CLAs cannot be signed in GitHub and must be signed in PDF format and submitted to member-services@unicode.org. To check to see if the Consortium already has a signed Corporate CLA on file for a particular company or other entity, please see the Public List of Corporate CLAs.

For further, more detailed instructions on how to sign a Unicode CLA in GitHub or in PDF format, please see How to Sign a Unicode CLA in the Unicode Consortium Intellectual Property, Licensing & Technical Contribution Policies. If you have questions, please contact member-services@unicode.org.

Once a contributor, whether individual or corporate, has signed a Unicode CLA, they may continue to make additional contributions to the Unicode Consortium indefinitely without having to sign a CLA for each separate contribution.

IP Claimants in Scripts & Characters

As noted above, if there are any persons or entities who have, may have, or claim intellectual property rights (copyright, design, or patent rights) in a proposed character or script itself, then the Consortium requires two things of all such IP owners/claimants: (i) that they sign a Unicode CLA or other appropriate license agreement, and (ii) that they provide a formal written endorsement of the proposal. For instructions on how to sign a standard Unicode CLA, please see above, as well as the Unicode Consortium Intellectual Property, Licensing & Technical Contribution Policies. To provide the required written endorsement from an IP owner/claimant in the proposed characters/scripts, please send an email to script-proposals@unicode.org from an email account that is identifiable as that of the IP owner/claimant and provide the endorsement of the proposal, clearly identifying the proposal by name, date, and author(s).

This will be an unusual scenario – the vast majority of scripts and characters that are in scope for encoding in the Unicode Standard are generally not subject to intellectual property protection for a variety of reasons. However, “fictional” languages/scripts, such as Elvish from Lord of the Rings, may be subject to copyright protection depending on the particular circumstances and jurisdiction. Additionally, there are some scripts (whether fictional or not) in which the script creator expressly claims copyright or other IP rights and/or has registered such rights.

Whether “fictional” and “created” languages/scripts are in fact subject to intellectual property protection is disputed by some and is not an area of well-settled law around the world. The Consortium acknowledges that there is no clear consensus on these questions in every jurisdiction. Nevertheless, in the interests of making Unicode standards, specifications, data, and software as widely and freely available as possible, it is Consortium policy that a CLA or similar license is required in these cases.

Proposers are required to identify any such potential IP owners or claimants in their proposals and should obtain the formal endorsement of such owners/claimants. The Consortium will not consider proposals that are not endorsed in writing by all IP claimants in the proposed characters/scripts. The Consortium does not have the resources to research and vet prior IP rights, and in cases where a proposal is not endorsed in writing by all IP claimants, and/or fails to provide sufficient information regarding IP rights/claims, the Consortium will have little choice but to decline to encode.

When potential IP owners in the script/characters are identified, the Consortium will need to review the circumstances and consider whether a standard CLA or other similar license best meets the needs for encoding. Proposers and IP claimants should provide as much information as possible about claimed IP rights in such cases to facilitate the Consortium’s review and to increase the chances that the Consortium will be able to encode.

Summary of Requirements for Submission

The submission must include all of the following information:

1. The completed Proposal Summary Form, which (in summary) requires the following:

  • the repertoire, including proposed character names;
  • the name and contact information for the company or individual who will provide a computerized font (True Type or PostScript) for publication of the standard (see, Font Submission Policy for further information regarding font requirements);
  • references to dictionaries and descriptive texts establishing authoritative information;
  • names and addresses of appropriate contacts within national body or user organizations;
  • the context within which the proposed characters are used (for example, current, historical, and so on);
  • especially for sporadic additions, what similarities or relationships the proposed characters bear to existing characters already encoded in the standard.

2. All additional relevant information for your proposal as described above in the Proposal Guidelines.

3. All information required by the above Legal & Licensing Requirements, namely,

  • names and contact information (country of residence, email address, and website address (if one exists)) for all of the following:
    • all proposal authors;
    • all proposal sponsors/endorsers;
    • any individual or entity who may own or claim intellectual property rights (copyright, design, or patent rights) in the proposed scripts/characters themselves, and all available information regarding the nature and extent of such IP rights - the more information you can provide in this regard, the better); and
  • for each proposal author, an affirmation that
    • the author has signed an Individual CLA, identifying whether the CLA was signed in Github or in PDF; or
    • the author’s contribution is owned by their employer and is covered by their employer’s existing Corporate CLA on file with the Consortium, identifying the employer by company name so that the existence of the Corporate CLA can be verified. (If you wish to check whether your employer has signed a Unicode CLA, see the Public Unicode Corporate CLA List.)

The foregoing legal information should not be provided in the proposal documentation that will (if acceptable) be posted in the UTC Document Registry, but rather only in the body of the email you send to script-proposals@unicode.org forwarding the proposal documents. Because proposal documentation may be publicly posted in the UTC Document Registry when the proposal is formally forwarded to the UTC for consideration, personally-identifiable and other legal information should not be included in the proposal documentation, but rather only in the email sending such documentation. If you do include such information in the proposal documentation, you consent to its publication in the UTC Document Registry.

All proposals (whether or not successful) and related materials will be retained by the Unicode Consortium as a matter of record and may be used for any legitimate Consortium purpose subject to the Unicode Consortium Intellectual Property, Licensing & Technical Contribution Policies.

Proposal Review Process

The international standardization of entire scripts requires a significant effort on the sponsor's part. It frequently takes years to move from an initial draft to final standardization, particularly because of the requirements to synchronize proposals with the work done in the ISO committee responsible for the development of ISO/IEC 10646.

Experience has shown that it is often helpful to discuss preliminary proposals before submitting a detailed proposal. One option is to become a member of the Unicode Consortium, and submit the proposal to the members-only email list. Alternatively, sponsors can contact the UC Berkeley’s Script Encoding Initiative for initial review.

Each proposal received will be evaluated initially by technical officers of the Unicode Consortium and the result of this initial evaluation will be communicated to the sponsor(s) of the proposal. Once a proposal passes this initial screening, it will be reviewed by the Unicode Technical Committee.

Sponsors, particularly of entire scripts, should be prepared to become involved at various times throughout the process -- perhaps revising their proposals more than once; collecting further detailed information; organizing on-line discussions or meetings to dispel controversy; or answering questions posed by committees or national bodies. Without such involvement, any proposal of more than a few characters is unlikely to be successful in the long-run.

Sponsors can monitor the further progress of their proposals via the public UTC minutes as well as the Proposed New Characters -- Pipeline Table page.

Examples

Many good proposals can be found in the UTC document register. Thesaurus Linguae Graecae has prepared a number of successful proposals.

For people interested in proposing a single symbol or a small set of symbols for encoding, there are also many successful proposals in the UTC document register. For example see the proposal for power symbols.

Interim Solutions

There are ways for programmers and scholarly organizations to make use of Unicode character encoding, even if the script they want to use or transmit is not yet (or may never be) part of the Unicode Standard. Individual groups that make use of rare scripts or special characters can reach a private agreement about interchange and set aside part of the Private Use Area to encode their private set of characters. Individuals with interests in rare scripts or materials relating to them may sometimes be contacted through an electronic mail list which the Consortium maintains. For information about these mail lists, please contact the Unicode office.

Sending Proposals

To send completed proposals or to make further inquiries, please see the Document Submission Details page.

All proposals are required to be in one of the following forms:

  • PDF format (preferred)
  • HTML along with any needed GIF or JPEG images (a ZIP file or TAR archive should be made, including all of the required files)