LGR for unspecified language | Selected-recommended-IdentifierType-in-MSR-but-not-in-RefLGR |
---|
This document is mechanically formatted from the above XML file for the LGR. It provides additional summary data and explanatory text. The XML file remains the sole normative specification of the LGR.
Date | 2025-01-02 |
---|---|
LGR Version | 16.0.0 |
Unicode Version | 16.0.0 |
Description
Partially updates
L2/19-329R
Characters recommended in both UTS#39 and MSR but excluded from the Root Zone or Reference LGR
This document has been submitted as a UTC document. For convenience in documenting the character list it is presented using an LGR template format. A few minor details of the boilerplate in that template may not be applicable in this context and should be disregarded.
The collection comprises 274 characters from the [MSR] that are recommended in UTS#39 but are not part of the Reference LGR [RefLGR], as well as the uppercase equivalents for 88 of them (Latin), plus 70 decimal digits excluded from the RefLGR, for a total of 432 characters.
Recommendation
These 432 characters should be considered Uncommon_Use, based on the fact that the expert teams charged with reviewing them for the ICANN Root Zone LGR and Reference LGR for the Second Level could not come up with evidence that they are used in common everyday writing, even for minority languages in reasonably widespread use. Consequently, they declined to include them in the respective LGRs.
Background
There are about a thousand non-Han characters with Identifier_Type Recommended that should be reclassified because they appear to fail reasonable criteria for being needed in identifiers. They come in two sets. For one set, an independent analysis [MSR] has found indications that they should have been considered Uncommon_Use, Obsolete or Technical based on information available at the time of encoding. That set is discussed in another document. The second set contains characters that were tentatively retained as Recommended in the [MSR] but upon further review by local expert teams from the [RZ-LGR] project were found to not be needed for any language or minority language in reasonably widespread use. That determination started with the [EGIDS] classification as a proxy but made further adjustments in expert review.
This analysis was carried out for the purposes of defining the repertoire for IDN Top Level Domain names for the DNS Root Zone. There are some restrictions that are specific to the Root Zone, such as a prohibition on digits, so a follow-on effort determined how to relax these restrictions in a manner appropriate for the needs of Second-Level Domains. This resulted in the Second-Level Reference Label Generation Rules [RefLGR]. The characters listed in this document are those that were not added to the [RefLGR], for lack of evidence of their use in everyday common writing for any language or minority language in vigorous and reasonably widespread use. Also listed are their uppercase equivalents as well as any native digit sets that were not added to the [RefLGR].
The implication here is that any character not included in the Reference LGR for lack of documented or identifiable usage should be considered Uncommon_Use for Unicode's default identifiers—until such time as independent evidence to the contrary is produced. Until then, in lack of a demonstrated use case, it seems not helpful to continue to suggest that these characters should be supported as recommended. This also applies to some of the sets of native digits, where local experts considered them obsolete for the purpose of identifiers.
Arriving at a precise cutoff for Uncommon_Use is difficult because there is no single source or perfect information on the use of writing systems, and the details of such use are changing over time. Accordingly, this document suggests that the UTC should consider the published results of the cited research as one of the better sources of information available and only deviate from it on the basis of even better information.
All decisions for the classification of characters in [MSR], or inclusion in [RZ-LGR] and [RefLGR] are documented and sourced on the character level; the same is not true for Unicode's classification, so it is not easily possible to verify any of the decisions that underlie the classification published in UTS39. By first making the alignment proposed here, and then carefully documenting deviations, a positive side effect might be that the classification overall becomes more transparent and reviewable.
Special Considerations
The [RZ-LGR] and [RefLGR] exclude the Bopomofo script, considering the entire script special use as it tends to be used almost exclusively in education. This could be addressed by either changing the status of the script in UAX31 to Limited_Use or by marking the entire set of Bopomofo characters as Technical. (This is not reflected in the list of characters in this document.)
No definite recommendations can be made for the Tibetan script. It is considered by ICANN as eligible for the Root Zone in principle, but work on defining the label generation rules has faced some difficulties and has not commenced. It might be reasonable to reflect that uncertainty by also not giving these characters Identifier_Type Recommended until some body, project, or group has created a definite analysis of this script for identifier purposes. (Tibetan characters have been excluded from the list of characters in this document).
Arabic combining marks are categorically excluded from domain names, see also RFC5564. In consequence, they should not be Recommended by Unicode, but if it is felt that Uncommon_Use is not the best classification, then perhaps Inclusion or Technical might be more appropriate.
Root Zone and Reference LGRs
For further background on the DNS Root Zone and Second-Level Reference LGR see the cited references and links therein.
Additional Notes
- U+0931 ऱ DEVANAGARI LETTER RRA is part of the Root Zone and Reference LGR via sequence (does not occur standalone, but should be retained in Recommended)
- U+09BC ় BENGALI SIGN NUKTA is part of the Root Zone and Reference LGR via sequence (does not occur standalone, but should be retained in Recommended)
- U+0DA6 ඦ SINHALA LETTER SANYAKA JAYANNA is part of the Root Zone and Reference LGR via sequence (does not occur standalone, but should be retained as Recommended)
- U+0E45 ๅ THAI CHARACTER LAKKHANGYAO is part of the Root Zone and Reference LGR via sequence (does not occur standalone, but should be retained as Recommended)
- U+1063 ၣ MYANMAR TONE MARK SGAW KAREN HATHI is part of the Root Zone and Reference LGR via sequence (does not occur standalone, but should be retained in Recommended)
Note: All characters have tags matching their Identifier_Type values, except Uppercase equivalents are tagged as Uppercase. A comment indicates the nature of the exclusion from the [RefLGR], in this case "Not documented to be in common use". The definition of IDNs excludes uppercase characters, however, for case pairs the analysis for the lowercase letter is treated as applicable. Native digit sets excluded from the [RefLGR] based on information that their use is not preferred in that context are listed with their Identifier_Type and a comment indicating their exclusion from the [RefLGR].
Discussion and Review
Domain names are an important, and deliberately conservative set of identifiers. That said, there may be other classes of identifiers that don't require the same level of restrictions, so this proposal should not be understood to suggest that default Identifiers must be restricted to only those characters that are being recommended for IDNs. Rather, the purpose is to bring the facts discovered during the development of the IDN repertoire for the DNS Root Zone and the [RefLGR] to the attention of the Unicode Technical Committee, so that characters that were classified Recommended can be given additional scrutiny before confirming their status.
As review progresses, a number of characters have been identified that may well have documented use:
- U+0671 ٱ ARABIC LETTER ALEF WASLA - this letter is considered to be "an important Quranic character" (which would make it Technical, but not Uncommon_Use). It is also claimed to be used with a newly invented orthography Luri language in Iran.
Other issues
Combining marks: where combining marks are excluded, but needed for decompositions (such as U+0654 ٔ ), it was proposed to focus on the NFC format for Identifier_Type, documenting that combining characters may be marked as Uncommon_Use even when they are in the NFD version of a modern language's exemplar characters.
Combining marks and Arabic Script: the Internet Architecture Board [IAB] has issued a statement referencing this issue. Please also see the “Proposal for Arabic Script Root Zone LGR”, [Proposal-Arabic].
Contributors
This excerpt was prepared by Asmus Freytag, based on published data found in [RefLGR] and reference information from [MSR]. For details on the process and contributors to those projects, see [RefLGR-Overview], in particular, Section 1, “Overview” and Section 6, “Contributors”. Michel Suignard and Roozbeh Pournader have contributed feedback.
Repertoire
Repertoire Summary
Number of elements in repertoire | 432 |
---|---|
Longest code point sequence | 1 |
Repertoire by Code Point
The following table lists the repertoire by code point (or code point sequence). The data in the Script and Name column are extracted from the Unicode character database. Where a comment in the original LGR is equal to the character name, it has been suppressed.
See also the legend provided below the table.
Code Point |
Glyph | Script | Name | Ref | Tags | Comment |
---|---|---|---|---|---|---|
U+0114 | Ĕ | Latin | LATIN CAPITAL LETTER E WITH BREVE | [100] | Uppercase | |
U+0115 | ĕ | Latin | LATIN SMALL LETTER E WITH BREVE | [100] | Recommended | Not in documented common use |
U+012C | Ĭ | Latin | LATIN CAPITAL LETTER I WITH BREVE | [100] | Uppercase | |
U+012D | ĭ | Latin | LATIN SMALL LETTER I WITH BREVE | [100] | Recommended | Not in documented common use |
U+014E | Ŏ | Latin | LATIN CAPITAL LETTER O WITH BREVE | [100] | Uppercase | |
U+014F | ŏ | Latin | LATIN SMALL LETTER O WITH BREVE | [100] | Recommended | Not in documented common use |
U+0156 | Ŗ | Latin | LATIN CAPITAL LETTER R WITH CEDILLA | [100] | Uppercase | |
U+0157 | ŗ | Latin | LATIN SMALL LETTER R WITH CEDILLA | [100] | Recommended | Not in documented common use |
U+0162 | Ţ | Latin | LATIN CAPITAL LETTER T WITH CEDILLA | [100] | Uppercase | |
U+0163 | ţ | Latin | LATIN SMALL LETTER T WITH CEDILLA | [100] | Recommended | Not in documented common use |
U+01D5 | Ǖ | Latin | LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON | [100] | Uppercase | |
U+01D6 | ǖ | Latin | LATIN SMALL LETTER U WITH DIAERESIS AND MACRON | [100] | Recommended | Not in documented common use |
U+01D7 | Ǘ | Latin | LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE | [100] | Uppercase | |
U+01D8 | ǘ | Latin | LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE | [100] | Recommended | Not in documented common use |
U+01D9 | Ǚ | Latin | LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON | [100] | Uppercase | |
U+01DA | ǚ | Latin | LATIN SMALL LETTER U WITH DIAERESIS AND CARON | [100] | Recommended | Not in documented common use |
U+01DB | Ǜ | Latin | LATIN CAPITAL LETTER U WITH DIAERESIS AND GRAVE | [100] | Uppercase | |
U+01DC | ǜ | Latin | LATIN SMALL LETTER U WITH DIAERESIS AND GRAVE | [100] | Recommended | Not in documented common use |
U+01DE | Ǟ | Latin | LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON | [100] | Uppercase | |
U+01DF | ǟ | Latin | LATIN SMALL LETTER A WITH DIAERESIS AND MACRON | [100] | Recommended | Not in documented common use |
U+01E0 | Ǡ | Latin | LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON | [100] | Uppercase | |
U+01E1 | ǡ | Latin | LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON | [100] | Recommended | Not in documented common use |
U+01E2 | Ǣ | Latin | LATIN CAPITAL LETTER AE WITH MACRON | [100] | Uppercase | |
U+01E3 | ǣ | Latin | LATIN SMALL LETTER AE WITH MACRON | [100] | Recommended | Not in documented common use |
U+01EA | Ǫ | Latin | LATIN CAPITAL LETTER O WITH OGONEK | [100] | Uppercase | |
U+01EB | ǫ | Latin | LATIN SMALL LETTER O WITH OGONEK | [100] | Recommended | Not in documented common use |
U+01EC | Ǭ | Latin | LATIN CAPITAL LETTER O WITH OGONEK AND MACRON | [100] | Uppercase | |
U+01ED | ǭ | Latin | LATIN SMALL LETTER O WITH OGONEK AND MACRON | [100] | Recommended | Not in documented common use |
U+01F0 | ǰ | Latin | LATIN SMALL LETTER J WITH CARON | [100] | Recommended | Not in documented common use |
U+01F4 | Ǵ | Latin | LATIN CAPITAL LETTER G WITH ACUTE | [100] | Uppercase | |
U+01F5 | ǵ | Latin | LATIN SMALL LETTER G WITH ACUTE | [100] | Recommended | Not in documented common use |
U+01F8 | Ǹ | Latin | LATIN CAPITAL LETTER N WITH GRAVE | [100] | Uppercase | |
U+01F9 | ǹ | Latin | LATIN SMALL LETTER N WITH GRAVE | [100] | Recommended | Not in documented common use |
U+01FA | Ǻ | Latin | LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE | [100] | Uppercase | |
U+01FB | ǻ | Latin | LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE | [100] | Recommended | Not in documented common use |
U+01FC | Ǽ | Latin | LATIN CAPITAL LETTER AE WITH ACUTE | [100] | Uppercase | |
U+01FD | ǽ | Latin | LATIN SMALL LETTER AE WITH ACUTE | [100] | Recommended | Not in documented common use |
U+01FE | Ǿ | Latin | LATIN CAPITAL LETTER O WITH STROKE AND ACUTE | [100] | Uppercase | |
U+01FF | ǿ | Latin | LATIN SMALL LETTER O WITH STROKE AND ACUTE | [100] | Recommended | Not in documented common use |
U+021E | Ȟ | Latin | LATIN CAPITAL LETTER H WITH CARON | [100] | Uppercase | |
U+021F | ȟ | Latin | LATIN SMALL LETTER H WITH CARON | [100] | Recommended | Not in documented common use |
U+0226 | Ȧ | Latin | LATIN CAPITAL LETTER A WITH DOT ABOVE | [100] | Uppercase | |
U+0227 | ȧ | Latin | LATIN SMALL LETTER A WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+0228 | Ȩ | Latin | LATIN CAPITAL LETTER E WITH CEDILLA | [100] | Uppercase | |
U+0229 | ȩ | Latin | LATIN SMALL LETTER E WITH CEDILLA | [100] | Recommended | Not in documented common use |
U+022A | Ȫ | Latin | LATIN CAPITAL LETTER O WITH DIAERESIS AND MACRON | [100] | Uppercase | |
U+022B | ȫ | Latin | LATIN SMALL LETTER O WITH DIAERESIS AND MACRON | [100] | Recommended | Not in documented common use |
U+022C | Ȭ | Latin | LATIN CAPITAL LETTER O WITH TILDE AND MACRON | [100] | Uppercase | |
U+022D | ȭ | Latin | LATIN SMALL LETTER O WITH TILDE AND MACRON | [100] | Recommended | Not in documented common use |
U+022E | Ȯ | Latin | LATIN CAPITAL LETTER O WITH DOT ABOVE | [100] | Uppercase | |
U+022F | ȯ | Latin | LATIN SMALL LETTER O WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+0230 | Ȱ | Latin | LATIN CAPITAL LETTER O WITH DOT ABOVE AND MACRON | [100] | Uppercase | |
U+0231 | ȱ | Latin | LATIN SMALL LETTER O WITH DOT ABOVE AND MACRON | [100] | Recommended | Not in documented common use |
U+0232 | Ȳ | Latin | LATIN CAPITAL LETTER Y WITH MACRON | [100] | Uppercase | |
U+0233 | ȳ | Latin | LATIN SMALL LETTER Y WITH MACRON | [100] | Recommended | Not in documented common use |
U+0400 | Ѐ | Cyrillic | CYRILLIC CAPITAL LETTER IE WITH GRAVE | [100] | Uppercase | |
U+040D | Ѝ | Cyrillic | CYRILLIC CAPITAL LETTER I WITH GRAVE | [100] | Uppercase | |
U+0450 | ѐ | Cyrillic | CYRILLIC SMALL LETTER IE WITH GRAVE | [100] | Recommended | Not in documented common use |
U+045D | ѝ | Cyrillic | CYRILLIC SMALL LETTER I WITH GRAVE | [100] | Recommended | Not in documented common use |
U+04C1 | Ӂ | Cyrillic | CYRILLIC CAPITAL LETTER ZHE WITH BREVE | [100] | Uppercase | |
U+04C2 | ӂ | Cyrillic | CYRILLIC SMALL LETTER ZHE WITH BREVE | [100] | Recommended | Not in documented common use |
U+04CB | Ӌ | Cyrillic | CYRILLIC CAPITAL LETTER KHAKASSIAN CHE | [100] | Uppercase | |
U+04CC | ӌ | Cyrillic | CYRILLIC SMALL LETTER KHAKASSIAN CHE | [100] | Recommended | Not in documented common use |
U+04DA | Ӛ | Cyrillic | CYRILLIC CAPITAL LETTER SCHWA WITH DIAERESIS | [100] | Uppercase | |
U+04DB | ӛ | Cyrillic | CYRILLIC SMALL LETTER SCHWA WITH DIAERESIS | [100] | Recommended | Not in documented common use |
U+04EA | Ӫ | Cyrillic | CYRILLIC CAPITAL LETTER BARRED O WITH DIAERESIS | [100] | Uppercase | |
U+04EB | ӫ | Cyrillic | CYRILLIC SMALL LETTER BARRED O WITH DIAERESIS | [100] | Recommended | Not in documented common use |
U+04EC | Ӭ | Cyrillic | CYRILLIC CAPITAL LETTER E WITH DIAERESIS | [100] | Uppercase | |
U+04ED | ӭ | Cyrillic | CYRILLIC SMALL LETTER E WITH DIAERESIS | [100] | Recommended | Not in documented common use |
U+05B4 | ִ | Hebrew | HEBREW POINT HIRIQ | [100] | Recommended | Not in documented common use |
U+05F0 | װ | Hebrew | HEBREW LIGATURE YIDDISH DOUBLE VAV | [100] | Recommended | Not in documented common use |
U+05F1 | ױ | Hebrew | HEBREW LIGATURE YIDDISH VAV YOD | [100] | Recommended | Not in documented common use |
U+05F2 | ײ | Hebrew | HEBREW LIGATURE YIDDISH DOUBLE YOD | [100] | Recommended | Not in documented common use |
U+064B | ً | Inherited | ARABIC FATHATAN | [100] | Recommended | Arabic combining marks are categorically excluded from domain names |
U+064C | ٌ | Inherited | ARABIC DAMMATAN | [100] | Recommended | Arabic combining marks are categorically excluded from domain names |
U+064D | ٍ | Inherited | ARABIC KASRATAN | [100] | Recommended | Arabic combining marks are categorically excluded from domain names |
U+064E | َ | Inherited | ARABIC FATHA | [100] | Recommended | Arabic combining marks are categorically excluded from domain names |
U+064F | ُ | Inherited | ARABIC DAMMA | [100] | Recommended | Arabic combining marks are categorically excluded from domain names |
U+0650 | ِ | Inherited | ARABIC KASRA | [100] | Recommended | Arabic combining marks are categorically excluded from domain names |
U+0651 | ّ | Inherited | ARABIC SHADDA | [100] | Recommended | Arabic combining marks are categorically excluded from domain names |
U+0652 | ْ | Inherited | ARABIC SUKUN | [100] | Recommended | Arabic combining marks are categorically excluded from domain names |
U+0654 | ٔ | Inherited | ARABIC HAMZA ABOVE | [100] | Recommended | Arabic combining marks are categorically excluded from domain names |
U+0655 | ٕ | Inherited | ARABIC HAMZA BELOW | [100] | Recommended | Arabic combining marks are categorically excluded from domain names |
U+0670 | ٰ | Inherited | ARABIC LETTER SUPERSCRIPT ALEF | [100] | Recommended | Arabic combining marks are categorically excluded from domain names |
U+0671 | ٱ | Arabic | ARABIC LETTER ALEF WASLA | [100] | Recommended | Not in documented common use |
U+0674 | ٴ | Arabic | ARABIC LETTER HIGH HAMZA | [100] | Recommended | Not in documented common use |
U+0682 | ڂ | Arabic | ARABIC LETTER HAH WITH TWO DOTS VERTICAL ABOVE | [100] | Recommended | Not in documented common use |
U+0690 | ڐ | Arabic | ARABIC LETTER DAL WITH FOUR DOTS ABOVE | [100] | Recommended | Not in documented common use |
U+0692 | ڒ | Arabic | ARABIC LETTER REH WITH SMALL V | [100] | Recommended | Not in documented common use |
U+0694 | ڔ | Arabic | ARABIC LETTER REH WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+069B | ڛ | Arabic | ARABIC LETTER SEEN WITH THREE DOTS BELOW | [100] | Recommended | Not in documented common use |
U+069C | ڜ | Arabic | ARABIC LETTER SEEN WITH THREE DOTS BELOW AND THREE DOTS ABOVE | [100] | Recommended | Not in documented common use |
U+069D | ڝ | Arabic | ARABIC LETTER SAD WITH TWO DOTS BELOW | [100] | Recommended | Not in documented common use |
U+069E | ڞ | Arabic | ARABIC LETTER SAD WITH THREE DOTS ABOVE | [100] | Recommended | Not in documented common use |
U+06A1 | ڡ | Arabic | ARABIC LETTER DOTLESS FEH | [100] | Recommended | Not in documented common use |
U+06A3 | ڣ | Arabic | ARABIC LETTER FEH WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+06A5 | ڥ | Arabic | ARABIC LETTER FEH WITH THREE DOTS BELOW | [100] | Recommended | Not in documented common use |
U+06B2 | ڲ | Arabic | ARABIC LETTER GAF WITH TWO DOTS BELOW | [100] | Recommended | Not in documented common use |
U+06B4 | ڴ | Arabic | ARABIC LETTER GAF WITH THREE DOTS ABOVE | [100] | Recommended | Not in documented common use |
U+06B6 | ڶ | Arabic | ARABIC LETTER LAM WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+06B7 | ڷ | Arabic | ARABIC LETTER LAM WITH THREE DOTS ABOVE | [100] | Recommended | Not in documented common use |
U+06B8 | ڸ | Arabic | ARABIC LETTER LAM WITH THREE DOTS BELOW | [100] | Recommended | Not in documented common use |
U+06B9 | ڹ | Arabic | ARABIC LETTER NOON WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+06BF | ڿ | Arabic | ARABIC LETTER TCHEH WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+06C5 | ۅ | Arabic | ARABIC LETTER KIRGHIZ OE | [100] | Recommended | Not in documented common use |
U+06C7 | ۇ | Arabic | ARABIC LETTER U | [100] | Recommended | Not in documented common use |
U+06C8 | ۈ | Arabic | ARABIC LETTER YU | [100] | Recommended | Not in documented common use |
U+06C9 | ۉ | Arabic | ARABIC LETTER KIRGHIZ YU | [100] | Recommended | Not in documented common use |
U+06CA | ۊ | Arabic | ARABIC LETTER WAW WITH TWO DOTS ABOVE | [100] | Recommended | Not in documented common use |
U+06D3 | ۓ | Arabic | ARABIC LETTER YEH BARREE WITH HAMZA ABOVE | [100] | Recommended | Not in documented common use |
U+06EE | ۮ | Arabic | ARABIC LETTER DAL WITH INVERTED V | [100] | Recommended | Not in documented common use |
U+06EF | ۯ | Arabic | ARABIC LETTER REH WITH INVERTED V | [100] | Recommended | Not in documented common use |
U+06FA | ۺ | Arabic | ARABIC LETTER SHEEN WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+06FB | ۻ | Arabic | ARABIC LETTER DAD WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+06FC | ۼ | Arabic | ARABIC LETTER GHAIN WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+06FF | ۿ | Arabic | ARABIC LETTER HEH WITH INVERTED V | [100] | Recommended | Not in documented common use |
U+0750 | ݐ | Arabic | ARABIC LETTER BEH WITH THREE DOTS HORIZONTALLY BELOW | [100] | Recommended | Not in documented common use |
U+0753 | ݓ | Arabic | ARABIC LETTER BEH WITH THREE DOTS POINTING UPWARDS BELOW AND TWO DOTS ABOVE | [100] | Recommended | Not in documented common use |
U+0754 | ݔ | Arabic | ARABIC LETTER BEH WITH TWO DOTS BELOW AND DOT ABOVE | [100] | Recommended | Not in documented common use |
U+0755 | ݕ | Arabic | ARABIC LETTER BEH WITH INVERTED SMALL V BELOW | [100] | Recommended | Not in documented common use |
U+0757 | ݗ | Arabic | ARABIC LETTER HAH WITH TWO DOTS ABOVE | [100] | Recommended | Not in documented common use |
U+0758 | ݘ | Arabic | ARABIC LETTER HAH WITH THREE DOTS POINTING UPWARDS BELOW | [100] | Recommended | Not in documented common use |
U+0759 | ݙ | Arabic | ARABIC LETTER DAL WITH TWO DOTS VERTICALLY BELOW AND SMALL TAH | [100] | Recommended | Not in documented common use |
U+075A | ݚ | Arabic | ARABIC LETTER DAL WITH INVERTED SMALL V BELOW | [100] | Recommended | Not in documented common use |
U+075B | ݛ | Arabic | ARABIC LETTER REH WITH STROKE | [100] | Recommended | Not in documented common use |
U+075C | ݜ | Arabic | ARABIC LETTER SEEN WITH FOUR DOTS ABOVE | [100] | Recommended | Not in documented common use |
U+075D | ݝ | Arabic | ARABIC LETTER AIN WITH TWO DOTS ABOVE | [100] | Recommended | Not in documented common use |
U+075E | ݞ | Arabic | ARABIC LETTER AIN WITH THREE DOTS POINTING DOWNWARDS ABOVE | [100] | Recommended | Not in documented common use |
U+075F | ݟ | Arabic | ARABIC LETTER AIN WITH TWO DOTS VERTICALLY ABOVE | [100] | Recommended | Not in documented common use |
U+0761 | ݡ | Arabic | ARABIC LETTER FEH WITH THREE DOTS POINTING UPWARDS BELOW | [100] | Recommended | Not in documented common use |
U+0764 | ݤ | Arabic | ARABIC LETTER KEHEH WITH THREE DOTS POINTING UPWARDS BELOW | [100] | Recommended | Not in documented common use |
U+0765 | ݥ | Arabic | ARABIC LETTER MEEM WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+0769 | ݩ | Arabic | ARABIC LETTER NOON WITH SMALL V | [100] | Recommended | Not in documented common use |
U+076B | ݫ | Arabic | ARABIC LETTER REH WITH TWO DOTS VERTICALLY ABOVE | [100] | Recommended | Not in documented common use |
U+076C | ݬ | Arabic | ARABIC LETTER REH WITH HAMZA ABOVE | [100] | Recommended | Not in documented common use |
U+076D | ݭ | Arabic | ARABIC LETTER SEEN WITH TWO DOTS VERTICALLY ABOVE | [100] | Recommended | Not in documented common use |
U+0772 | ݲ | Arabic | ARABIC LETTER HAH WITH SMALL ARABIC LETTER TAH ABOVE | [100] | Recommended | Not in documented common use |
U+0773 | ݳ | Arabic | ARABIC LETTER ALEF WITH EXTENDED ARABIC-INDIC DIGIT TWO ABOVE | [100] | Recommended | Not in documented common use |
U+0774 | ݴ | Arabic | ARABIC LETTER ALEF WITH EXTENDED ARABIC-INDIC DIGIT THREE ABOVE | [100] | Recommended | Not in documented common use |
U+0775 | ݵ | Arabic | ARABIC LETTER FARSI YEH WITH EXTENDED ARABIC-INDIC DIGIT TWO ABOVE | [100] | Recommended | Not in documented common use |
U+0776 | ݶ | Arabic | ARABIC LETTER FARSI YEH WITH EXTENDED ARABIC-INDIC DIGIT THREE ABOVE | [100] | Recommended | Not in documented common use |
U+0777 | ݷ | Arabic | ARABIC LETTER FARSI YEH WITH EXTENDED ARABIC-INDIC DIGIT FOUR BELOW | [100] | Recommended | Not in documented common use |
U+0778 | ݸ | Arabic | ARABIC LETTER WAW WITH EXTENDED ARABIC-INDIC DIGIT TWO ABOVE | [100] | Recommended | Not in documented common use |
U+0779 | ݹ | Arabic | ARABIC LETTER WAW WITH EXTENDED ARABIC-INDIC DIGIT THREE ABOVE | [100] | Recommended | Not in documented common use |
U+077A | ݺ | Arabic | ARABIC LETTER YEH BARREE WITH EXTENDED ARABIC-INDIC DIGIT TWO ABOVE | [100] | Recommended | Not in documented common use |
U+077B | ݻ | Arabic | ARABIC LETTER YEH BARREE WITH EXTENDED ARABIC-INDIC DIGIT THREE ABOVE | [100] | Recommended | Not in documented common use |
U+077C | ݼ | Arabic | ARABIC LETTER HAH WITH EXTENDED ARABIC-INDIC DIGIT FOUR BELOW | [100] | Recommended | Not in documented common use |
U+077D | ݽ | Arabic | ARABIC LETTER SEEN WITH EXTENDED ARABIC-INDIC DIGIT FOUR ABOVE | [100] | Recommended | Not in documented common use |
U+08A1 | ࢡ | Arabic | ARABIC LETTER BEH WITH HAMZA ABOVE | [100] | Recommended | Not in documented common use |
U+08AA | ࢪ | Arabic | ARABIC LETTER REH WITH LOOP | [100] | Recommended | Not in documented common use |
U+08AB | ࢫ | Arabic | ARABIC LETTER WAW WITH DOT WITHIN | [100] | Recommended | Not in documented common use |
U+08AC | ࢬ | Arabic | ARABIC LETTER ROHINGYA YEH | [100] | Recommended | Not in documented common use |
U+0904 | ऄ | Devanagari | DEVANAGARI LETTER SHORT A | [100] | Recommended | Not in documented common use |
U+090C | ऌ | Devanagari | DEVANAGARI LETTER VOCALIC L | [100] | Recommended | Not in documented common use |
U+0929 | ऩ | Devanagari | DEVANAGARI LETTER NNNA | [100] | Recommended | Not in documented common use |
U+0934 | ऴ | Devanagari | DEVANAGARI LETTER LLLA | [100] | Recommended | Not in documented common use |
U+0944 | ॄ | Devanagari | DEVANAGARI VOWEL SIGN VOCALIC RR | [100] | Recommended | Not in documented common use |
U+0979 | ॹ | Devanagari | DEVANAGARI LETTER ZHA | [100] | Recommended | Not in documented common use |
U+097A | ॺ | Devanagari | DEVANAGARI LETTER HEAVY YA | [100] | Recommended | Not in documented common use |
U+098C | ঌ | Bengali | BENGALI LETTER VOCALIC L | [100] | Recommended | Not in documented common use |
U+09D7 | ৗ | Bengali | BENGALI AU LENGTH MARK | [100] | Recommended | Not in documented common use |
U+0A03 | ਃ | Gurmukhi | GURMUKHI SIGN VISARGA | [100] | Recommended | Not in documented common use |
U+0A66 | ੦ | Gurmukhi | GURMUKHI DIGIT ZERO | [100] | Recommended | Native digits not in common use |
U+0A67 | ੧ | Gurmukhi | GURMUKHI DIGIT ONE | [100] | Recommended | Native digits not in common use |
U+0A68 | ੨ | Gurmukhi | GURMUKHI DIGIT TWO | [100] | Recommended | Native digits not in common use |
U+0A69 | ੩ | Gurmukhi | GURMUKHI DIGIT THREE | [100] | Recommended | Native digits not in common use |
U+0A6A | ੪ | Gurmukhi | GURMUKHI DIGIT FOUR | [100] | Recommended | Native digits not in common use |
U+0A6B | ੫ | Gurmukhi | GURMUKHI DIGIT FIVE | [100] | Recommended | Native digits not in common use |
U+0A6C | ੬ | Gurmukhi | GURMUKHI DIGIT SIX | [100] | Recommended | Native digits not in common use |
U+0A6D | ੭ | Gurmukhi | GURMUKHI DIGIT SEVEN | [100] | Recommended | Native digits not in common use |
U+0A6E | ੮ | Gurmukhi | GURMUKHI DIGIT EIGHT | [100] | Recommended | Native digits not in common use |
U+0A6F | ੯ | Gurmukhi | GURMUKHI DIGIT NINE | [100] | Recommended | Native digits not in common use |
U+0A72 | ੲ | Gurmukhi | GURMUKHI IRI | [100] | Recommended | Not in documented common use |
U+0A73 | ੳ | Gurmukhi | GURMUKHI URA | [100] | Recommended | Not in documented common use |
U+0A81 | ઁ | Gujarati | GUJARATI SIGN CANDRABINDU | [100] | Recommended | Not in documented common use |
U+0B0C | ଌ | Oriya | ORIYA LETTER VOCALIC L | [100] | Recommended | Not in documented common use |
U+0B35 | ଵ | Oriya | ORIYA LETTER VA | [100] | Recommended | Not in documented common use |
U+0B57 | ୗ | Oriya | ORIYA AU LENGTH MARK | [100] | Recommended | Not in documented common use |
U+0B66 | ୦ | Oriya | ORIYA DIGIT ZERO | [100] | Recommended | Native digits not in common use |
U+0B67 | ୧ | Oriya | ORIYA DIGIT ONE | [100] | Recommended | Native digits not in common use |
U+0B68 | ୨ | Oriya | ORIYA DIGIT TWO | [100] | Recommended | Native digits not in common use |
U+0B69 | ୩ | Oriya | ORIYA DIGIT THREE | [100] | Recommended | Native digits not in common use |
U+0B6A | ୪ | Oriya | ORIYA DIGIT FOUR | [100] | Recommended | Native digits not in common use |
U+0B6B | ୫ | Oriya | ORIYA DIGIT FIVE | [100] | Recommended | Native digits not in common use |
U+0B6C | ୬ | Oriya | ORIYA DIGIT SIX | [100] | Recommended | Native digits not in common use |
U+0B6D | ୭ | Oriya | ORIYA DIGIT SEVEN | [100] | Recommended | Native digits not in common use |
U+0B6E | ୮ | Oriya | ORIYA DIGIT EIGHT | [100] | Recommended | Native digits not in common use |
U+0B6F | ୯ | Oriya | ORIYA DIGIT NINE | [100] | Recommended | Native digits not in common use |
U+0BD7 | ௗ | Tamil | TAMIL AU LENGTH MARK | [100] | Recommended | Not in documented common use |
U+0BE6 | ௦ | Tamil | TAMIL DIGIT ZERO | [100] | Recommended | Native digits not in common use |
U+0BE7 | ௧ | Tamil | TAMIL DIGIT ONE | [100] | Recommended | Native digits not in common use |
U+0BE8 | ௨ | Tamil | TAMIL DIGIT TWO | [100] | Recommended | Native digits not in common use |
U+0BE9 | ௩ | Tamil | TAMIL DIGIT THREE | [100] | Recommended | Native digits not in common use |
U+0BEA | ௪ | Tamil | TAMIL DIGIT FOUR | [100] | Recommended | Native digits not in common use |
U+0BEB | ௫ | Tamil | TAMIL DIGIT FIVE | [100] | Recommended | Native digits not in common use |
U+0BEC | ௬ | Tamil | TAMIL DIGIT SIX | [100] | Recommended | Native digits not in common use |
U+0BED | ௭ | Tamil | TAMIL DIGIT SEVEN | [100] | Recommended | Native digits not in common use |
U+0BEE | ௮ | Tamil | TAMIL DIGIT EIGHT | [100] | Recommended | Native digits not in common use |
U+0BEF | ௯ | Tamil | TAMIL DIGIT NINE | [100] | Recommended | Native digits not in common use |
U+0C0C | ఌ | Telugu | TELUGU LETTER VOCALIC L | [100] | Recommended | Not in documented common use |
U+0C31 | ఱ | Telugu | TELUGU LETTER RRA | [100] | Recommended | Not in documented common use |
U+0C55 | ౕ | Telugu | TELUGU LENGTH MARK | [100] | Recommended | Not in documented common use |
U+0C56 | ౖ | Telugu | TELUGU AI LENGTH MARK | [100] | Recommended | Not in documented common use |
U+0C66 | ౦ | Telugu | TELUGU DIGIT ZERO | [100] | Recommended | Native digits not in common use |
U+0C67 | ౧ | Telugu | TELUGU DIGIT ONE | [100] | Recommended | Native digits not in common use |
U+0C68 | ౨ | Telugu | TELUGU DIGIT TWO | [100] | Recommended | Native digits not in common use |
U+0C69 | ౩ | Telugu | TELUGU DIGIT THREE | [100] | Recommended | Native digits not in common use |
U+0C6A | ౪ | Telugu | TELUGU DIGIT FOUR | [100] | Recommended | Native digits not in common use |
U+0C6B | ౫ | Telugu | TELUGU DIGIT FIVE | [100] | Recommended | Native digits not in common use |
U+0C6C | ౬ | Telugu | TELUGU DIGIT SIX | [100] | Recommended | Native digits not in common use |
U+0C6D | ౭ | Telugu | TELUGU DIGIT SEVEN | [100] | Recommended | Native digits not in common use |
U+0C6E | ౮ | Telugu | TELUGU DIGIT EIGHT | [100] | Recommended | Native digits not in common use |
U+0C6F | ౯ | Telugu | TELUGU DIGIT NINE | [100] | Recommended | Native digits not in common use |
U+0C8C | ಌ | Kannada | KANNADA LETTER VOCALIC L | [100] | Recommended | Not in documented common use |
U+0CB1 | ಱ | Kannada | KANNADA LETTER RRA | [100] | Recommended | Not in documented common use |
U+0CBC | ಼ | Kannada | KANNADA SIGN NUKTA | [100] | Recommended | Not in documented common use |
U+0CC4 | ೄ | Kannada | KANNADA VOWEL SIGN VOCALIC RR | [100] | Recommended | Not in documented common use |
U+0CD5 | ೕ | Kannada | KANNADA LENGTH MARK | [100] | Recommended | Not in documented common use |
U+0CD6 | ೖ | Kannada | KANNADA AI LENGTH MARK | [100] | Recommended | Not in documented common use |
U+0D0C | ഌ | Malayalam | MALAYALAM LETTER VOCALIC L | [100] | Recommended | Not in documented common use |
U+0D29 | ഩ | Malayalam | MALAYALAM LETTER NNNA | [100] | Recommended | Not in documented common use |
U+0D66 | ൦ | Malayalam | MALAYALAM DIGIT ZERO | [100] | Recommended | Native digits not in common use |
U+0D67 | ൧ | Malayalam | MALAYALAM DIGIT ONE | [100] | Recommended | Native digits not in common use |
U+0D68 | ൨ | Malayalam | MALAYALAM DIGIT TWO | [100] | Recommended | Native digits not in common use |
U+0D69 | ൩ | Malayalam | MALAYALAM DIGIT THREE | [100] | Recommended | Native digits not in common use |
U+0D6A | ൪ | Malayalam | MALAYALAM DIGIT FOUR | [100] | Recommended | Native digits not in common use |
U+0D6B | ൫ | Malayalam | MALAYALAM DIGIT FIVE | [100] | Recommended | Native digits not in common use |
U+0D6C | ൬ | Malayalam | MALAYALAM DIGIT SIX | [100] | Recommended | Native digits not in common use |
U+0D6D | ൭ | Malayalam | MALAYALAM DIGIT SEVEN | [100] | Recommended | Native digits not in common use |
U+0D6E | ൮ | Malayalam | MALAYALAM DIGIT EIGHT | [100] | Recommended | Native digits not in common use |
U+0D6F | ൯ | Malayalam | MALAYALAM DIGIT NINE | [100] | Recommended | Native digits not in common use |
U+0D8E | ඎ | Sinhala | SINHALA LETTER IRUUYANNA | [100] | Recommended | Not in documented common use |
U+0D9E | ඞ | Sinhala | SINHALA LETTER KANTAJA NAASIKYAYA | [100] | Recommended | Not in documented common use |
U+0DE6 | ෦ | Sinhala | SINHALA LITH DIGIT ZERO | [100] | Recommended | Native digits not in common use |
U+0DE7 | ෧ | Sinhala | SINHALA LITH DIGIT ONE | [100] | Recommended | Native digits not in common use |
U+0DE8 | ෨ | Sinhala | SINHALA LITH DIGIT TWO | [100] | Recommended | Native digits not in common use |
U+0DE9 | ෩ | Sinhala | SINHALA LITH DIGIT THREE | [100] | Recommended | Native digits not in common use |
U+0DEA | ෪ | Sinhala | SINHALA LITH DIGIT FOUR | [100] | Recommended | Native digits not in common use |
U+0DEB | ෫ | Sinhala | SINHALA LITH DIGIT FIVE | [100] | Recommended | Native digits not in common use |
U+0DEC | ෬ | Sinhala | SINHALA LITH DIGIT SIX | [100] | Recommended | Native digits not in common use |
U+0DED | ෭ | Sinhala | SINHALA LITH DIGIT SEVEN | [100] | Recommended | Native digits not in common use |
U+0DEE | ෮ | Sinhala | SINHALA LITH DIGIT EIGHT | [100] | Recommended | Native digits not in common use |
U+0DEF | ෯ | Sinhala | SINHALA LITH DIGIT NINE | [100] | Recommended | Native digits not in common use |
U+0E4E | ๎ | Thai | THAI CHARACTER YAMAKKAN | [100] | Recommended | Not in documented common use |
U+0EDE | ໞ | Lao | LAO LETTER KHMU GO | [100] | Recommended | Not in documented common use |
U+0EDF | ໟ | Lao | LAO LETTER KHMU NYO | [100] | Recommended | Not in documented common use |
U+108B | ႋ | Myanmar | MYANMAR SIGN SHAN COUNCIL TONE-2 | [100] | Recommended | Not in documented common use |
U+108C | ႌ | Myanmar | MYANMAR SIGN SHAN COUNCIL TONE-3 | [100] | Recommended | Not in documented common use |
U+108D | ႍ | Myanmar | MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE | [100] | Recommended | Not in documented common use |
U+1090 | ႐ | Myanmar | MYANMAR SHAN DIGIT ZERO | [100] | Recommended | Native digits not in common use |
U+1091 | ႑ | Myanmar | MYANMAR SHAN DIGIT ONE | [100] | Recommended | Native digits not in common use |
U+1092 | ႒ | Myanmar | MYANMAR SHAN DIGIT TWO | [100] | Recommended | Native digits not in common use |
U+1093 | ႓ | Myanmar | MYANMAR SHAN DIGIT THREE | [100] | Recommended | Native digits not in common use |
U+1094 | ႔ | Myanmar | MYANMAR SHAN DIGIT FOUR | [100] | Recommended | Native digits not in common use |
U+1095 | ႕ | Myanmar | MYANMAR SHAN DIGIT FIVE | [100] | Recommended | Native digits not in common use |
U+1096 | ႖ | Myanmar | MYANMAR SHAN DIGIT SIX | [100] | Recommended | Native digits not in common use |
U+1097 | ႗ | Myanmar | MYANMAR SHAN DIGIT SEVEN | [100] | Recommended | Native digits not in common use |
U+1098 | ႘ | Myanmar | MYANMAR SHAN DIGIT EIGHT | [100] | Recommended | Native digits not in common use |
U+1099 | ႙ | Myanmar | MYANMAR SHAN DIGIT NINE | [100] | Recommended | Native digits not in common use |
U+10F7 | ჷ | Georgian | GEORGIAN LETTER YN | [100] | Recommended | Not in documented common use |
U+10F8 | ჸ | Georgian | GEORGIAN LETTER ELIFI | [100] | Recommended | Not in documented common use |
U+1207 | ሇ | Ethiopic | ETHIOPIC SYLLABLE HOA | [100] | Recommended | Not in documented common use |
U+1287 | ኇ | Ethiopic | ETHIOPIC SYLLABLE XOA | [100] | Recommended | Not in documented common use |
U+12AF | ኯ | Ethiopic | ETHIOPIC SYLLABLE KOA | [100] | Recommended | Not in documented common use |
U+12F8 | ዸ | Ethiopic | ETHIOPIC SYLLABLE DDA | [100] | Recommended | Not in documented common use |
U+12F9 | ዹ | Ethiopic | ETHIOPIC SYLLABLE DDU | [100] | Recommended | Not in documented common use |
U+12FA | ዺ | Ethiopic | ETHIOPIC SYLLABLE DDI | [100] | Recommended | Not in documented common use |
U+12FB | ዻ | Ethiopic | ETHIOPIC SYLLABLE DDAA | [100] | Recommended | Not in documented common use |
U+12FC | ዼ | Ethiopic | ETHIOPIC SYLLABLE DDEE | [100] | Recommended | Not in documented common use |
U+12FD | ዽ | Ethiopic | ETHIOPIC SYLLABLE DDE | [100] | Recommended | Not in documented common use |
U+12FE | ዾ | Ethiopic | ETHIOPIC SYLLABLE DDO | [100] | Recommended | Not in documented common use |
U+12FF | ዿ | Ethiopic | ETHIOPIC SYLLABLE DDWA | [100] | Recommended | Not in documented common use |
U+130F | ጏ | Ethiopic | ETHIOPIC SYLLABLE GOA | [100] | Recommended | Not in documented common use |
U+131F | ጟ | Ethiopic | ETHIOPIC SYLLABLE GGWAA | [100] | Recommended | Not in documented common use |
U+1347 | ፇ | Ethiopic | ETHIOPIC SYLLABLE TZOA | [100] | Recommended | Not in documented common use |
U+135A | ፚ | Ethiopic | ETHIOPIC SYLLABLE FYA | [100] | Recommended | Not in documented common use |
U+135D | ፝ | Ethiopic | ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK | [100] | Recommended | Not in documented common use |
U+135E | ፞ | Ethiopic | ETHIOPIC COMBINING VOWEL LENGTH MARK | [100] | Recommended | Not in documented common use |
U+135F | ፟ | Ethiopic | ETHIOPIC COMBINING GEMINATION MARK | [100] | Recommended | Not in documented common use |
U+179D | ឝ | Khmer | KHMER LETTER SHA | [100] | Recommended | Not in documented common use |
U+179E | ឞ | Khmer | KHMER LETTER SSO | [100] | Recommended | Not in documented common use |
U+17A9 | ឩ | Khmer | KHMER INDEPENDENT VOWEL QUU | [100] | Recommended | Not in documented common use |
U+17B2 | ឲ | Khmer | KHMER INDEPENDENT VOWEL QOO TYPE TWO | [100] | Recommended | Not in documented common use |
U+17D7 | ៗ | Khmer | KHMER SIGN LEK TOO | [100] | Recommended | Not in documented common use |
U+1E02 | Ḃ | Latin | LATIN CAPITAL LETTER B WITH DOT ABOVE | [100] | Uppercase | |
U+1E03 | ḃ | Latin | LATIN SMALL LETTER B WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E04 | Ḅ | Latin | LATIN CAPITAL LETTER B WITH DOT BELOW | [100] | Uppercase | |
U+1E05 | ḅ | Latin | LATIN SMALL LETTER B WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+1E06 | Ḇ | Latin | LATIN CAPITAL LETTER B WITH LINE BELOW | [100] | Uppercase | |
U+1E07 | ḇ | Latin | LATIN SMALL LETTER B WITH LINE BELOW | [100] | Recommended | Not in documented common use |
U+1E08 | Ḉ | Latin | LATIN CAPITAL LETTER C WITH CEDILLA AND ACUTE | [100] | Uppercase | |
U+1E09 | ḉ | Latin | LATIN SMALL LETTER C WITH CEDILLA AND ACUTE | [100] | Recommended | Not in documented common use |
U+1E0A | Ḋ | Latin | LATIN CAPITAL LETTER D WITH DOT ABOVE | [100] | Uppercase | |
U+1E0B | ḋ | Latin | LATIN SMALL LETTER D WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E0C | Ḍ | Latin | LATIN CAPITAL LETTER D WITH DOT BELOW | [100] | Uppercase | |
U+1E0D | ḍ | Latin | LATIN SMALL LETTER D WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+1E0E | Ḏ | Latin | LATIN CAPITAL LETTER D WITH LINE BELOW | [100] | Uppercase | |
U+1E0F | ḏ | Latin | LATIN SMALL LETTER D WITH LINE BELOW | [100] | Recommended | Not in documented common use |
U+1E10 | Ḑ | Latin | LATIN CAPITAL LETTER D WITH CEDILLA | [100] | Uppercase | |
U+1E11 | ḑ | Latin | LATIN SMALL LETTER D WITH CEDILLA | [100] | Recommended | Not in documented common use |
U+1E14 | Ḕ | Latin | LATIN CAPITAL LETTER E WITH MACRON AND GRAVE | [100] | Uppercase | |
U+1E15 | ḕ | Latin | LATIN SMALL LETTER E WITH MACRON AND GRAVE | [100] | Recommended | Not in documented common use |
U+1E16 | Ḗ | Latin | LATIN CAPITAL LETTER E WITH MACRON AND ACUTE | [100] | Uppercase | |
U+1E17 | ḗ | Latin | LATIN SMALL LETTER E WITH MACRON AND ACUTE | [100] | Recommended | Not in documented common use |
U+1E1C | Ḝ | Latin | LATIN CAPITAL LETTER E WITH CEDILLA AND BREVE | [100] | Uppercase | |
U+1E1D | ḝ | Latin | LATIN SMALL LETTER E WITH CEDILLA AND BREVE | [100] | Recommended | Not in documented common use |
U+1E1E | Ḟ | Latin | LATIN CAPITAL LETTER F WITH DOT ABOVE | [100] | Uppercase | |
U+1E1F | ḟ | Latin | LATIN SMALL LETTER F WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E22 | Ḣ | Latin | LATIN CAPITAL LETTER H WITH DOT ABOVE | [100] | Uppercase | |
U+1E23 | ḣ | Latin | LATIN SMALL LETTER H WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E24 | Ḥ | Latin | LATIN CAPITAL LETTER H WITH DOT BELOW | [100] | Uppercase | |
U+1E25 | ḥ | Latin | LATIN SMALL LETTER H WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+1E26 | Ḧ | Latin | LATIN CAPITAL LETTER H WITH DIAERESIS | [100] | Uppercase | |
U+1E27 | ḧ | Latin | LATIN SMALL LETTER H WITH DIAERESIS | [100] | Recommended | Not in documented common use |
U+1E28 | Ḩ | Latin | LATIN CAPITAL LETTER H WITH CEDILLA | [100] | Uppercase | |
U+1E29 | ḩ | Latin | LATIN SMALL LETTER H WITH CEDILLA | [100] | Recommended | Not in documented common use |
U+1E2E | Ḯ | Latin | LATIN CAPITAL LETTER I WITH DIAERESIS AND ACUTE | [100] | Uppercase | |
U+1E2F | ḯ | Latin | LATIN SMALL LETTER I WITH DIAERESIS AND ACUTE | [100] | Recommended | Not in documented common use |
U+1E30 | Ḱ | Latin | LATIN CAPITAL LETTER K WITH ACUTE | [100] | Uppercase | |
U+1E31 | ḱ | Latin | LATIN SMALL LETTER K WITH ACUTE | [100] | Recommended | Not in documented common use |
U+1E32 | Ḳ | Latin | LATIN CAPITAL LETTER K WITH DOT BELOW | [100] | Uppercase | |
U+1E33 | ḳ | Latin | LATIN SMALL LETTER K WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+1E34 | Ḵ | Latin | LATIN CAPITAL LETTER K WITH LINE BELOW | [100] | Uppercase | |
U+1E35 | ḵ | Latin | LATIN SMALL LETTER K WITH LINE BELOW | [100] | Recommended | Not in documented common use |
U+1E38 | Ḹ | Latin | LATIN CAPITAL LETTER L WITH DOT BELOW AND MACRON | [100] | Uppercase | |
U+1E39 | ḹ | Latin | LATIN SMALL LETTER L WITH DOT BELOW AND MACRON | [100] | Recommended | Not in documented common use |
U+1E3A | Ḻ | Latin | LATIN CAPITAL LETTER L WITH LINE BELOW | [100] | Uppercase | |
U+1E3B | ḻ | Latin | LATIN SMALL LETTER L WITH LINE BELOW | [100] | Recommended | Not in documented common use |
U+1E3E | Ḿ | Latin | LATIN CAPITAL LETTER M WITH ACUTE | [100] | Uppercase | |
U+1E3F | ḿ | Latin | LATIN SMALL LETTER M WITH ACUTE | [100] | Recommended | Not in documented common use |
U+1E40 | Ṁ | Latin | LATIN CAPITAL LETTER M WITH DOT ABOVE | [100] | Uppercase | |
U+1E41 | ṁ | Latin | LATIN SMALL LETTER M WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E4C | Ṍ | Latin | LATIN CAPITAL LETTER O WITH TILDE AND ACUTE | [100] | Uppercase | |
U+1E4D | ṍ | Latin | LATIN SMALL LETTER O WITH TILDE AND ACUTE | [100] | Recommended | Not in documented common use |
U+1E4E | Ṏ | Latin | LATIN CAPITAL LETTER O WITH TILDE AND DIAERESIS | [100] | Uppercase | |
U+1E4F | ṏ | Latin | LATIN SMALL LETTER O WITH TILDE AND DIAERESIS | [100] | Recommended | Not in documented common use |
U+1E50 | Ṑ | Latin | LATIN CAPITAL LETTER O WITH MACRON AND GRAVE | [100] | Uppercase | |
U+1E51 | ṑ | Latin | LATIN SMALL LETTER O WITH MACRON AND GRAVE | [100] | Recommended | Not in documented common use |
U+1E52 | Ṓ | Latin | LATIN CAPITAL LETTER O WITH MACRON AND ACUTE | [100] | Uppercase | |
U+1E53 | ṓ | Latin | LATIN SMALL LETTER O WITH MACRON AND ACUTE | [100] | Recommended | Not in documented common use |
U+1E54 | Ṕ | Latin | LATIN CAPITAL LETTER P WITH ACUTE | [100] | Uppercase | |
U+1E55 | ṕ | Latin | LATIN SMALL LETTER P WITH ACUTE | [100] | Recommended | Not in documented common use |
U+1E56 | Ṗ | Latin | LATIN CAPITAL LETTER P WITH DOT ABOVE | [100] | Uppercase | |
U+1E57 | ṗ | Latin | LATIN SMALL LETTER P WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E58 | Ṙ | Latin | LATIN CAPITAL LETTER R WITH DOT ABOVE | [100] | Uppercase | |
U+1E59 | ṙ | Latin | LATIN SMALL LETTER R WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E5A | Ṛ | Latin | LATIN CAPITAL LETTER R WITH DOT BELOW | [100] | Uppercase | |
U+1E5B | ṛ | Latin | LATIN SMALL LETTER R WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+1E5C | Ṝ | Latin | LATIN CAPITAL LETTER R WITH DOT BELOW AND MACRON | [100] | Uppercase | |
U+1E5D | ṝ | Latin | LATIN SMALL LETTER R WITH DOT BELOW AND MACRON | [100] | Recommended | Not in documented common use |
U+1E5E | Ṟ | Latin | LATIN CAPITAL LETTER R WITH LINE BELOW | [100] | Uppercase | |
U+1E5F | ṟ | Latin | LATIN SMALL LETTER R WITH LINE BELOW | [100] | Recommended | Not in documented common use |
U+1E60 | Ṡ | Latin | LATIN CAPITAL LETTER S WITH DOT ABOVE | [100] | Uppercase | |
U+1E61 | ṡ | Latin | LATIN SMALL LETTER S WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E64 | Ṥ | Latin | LATIN CAPITAL LETTER S WITH ACUTE AND DOT ABOVE | [100] | Uppercase | |
U+1E65 | ṥ | Latin | LATIN SMALL LETTER S WITH ACUTE AND DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E66 | Ṧ | Latin | LATIN CAPITAL LETTER S WITH CARON AND DOT ABOVE | [100] | Uppercase | |
U+1E67 | ṧ | Latin | LATIN SMALL LETTER S WITH CARON AND DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E68 | Ṩ | Latin | LATIN CAPITAL LETTER S WITH DOT BELOW AND DOT ABOVE | [100] | Uppercase | |
U+1E69 | ṩ | Latin | LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E6A | Ṫ | Latin | LATIN CAPITAL LETTER T WITH DOT ABOVE | [100] | Uppercase | |
U+1E6B | ṫ | Latin | LATIN SMALL LETTER T WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E6E | Ṯ | Latin | LATIN CAPITAL LETTER T WITH LINE BELOW | [100] | Uppercase | |
U+1E6F | ṯ | Latin | LATIN SMALL LETTER T WITH LINE BELOW | [100] | Recommended | Not in documented common use |
U+1E78 | Ṹ | Latin | LATIN CAPITAL LETTER U WITH TILDE AND ACUTE | [100] | Uppercase | |
U+1E79 | ṹ | Latin | LATIN SMALL LETTER U WITH TILDE AND ACUTE | [100] | Recommended | Not in documented common use |
U+1E7A | Ṻ | Latin | LATIN CAPITAL LETTER U WITH MACRON AND DIAERESIS | [100] | Uppercase | |
U+1E7B | ṻ | Latin | LATIN SMALL LETTER U WITH MACRON AND DIAERESIS | [100] | Recommended | Not in documented common use |
U+1E7C | Ṽ | Latin | LATIN CAPITAL LETTER V WITH TILDE | [100] | Uppercase | |
U+1E7D | ṽ | Latin | LATIN SMALL LETTER V WITH TILDE | [100] | Recommended | Not in documented common use |
U+1E7E | Ṿ | Latin | LATIN CAPITAL LETTER V WITH DOT BELOW | [100] | Uppercase | |
U+1E7F | ṿ | Latin | LATIN SMALL LETTER V WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+1E80 | Ẁ | Latin | LATIN CAPITAL LETTER W WITH GRAVE | [100] | Uppercase | |
U+1E81 | ẁ | Latin | LATIN SMALL LETTER W WITH GRAVE | [100] | Recommended | Not in documented common use |
U+1E82 | Ẃ | Latin | LATIN CAPITAL LETTER W WITH ACUTE | [100] | Uppercase | |
U+1E83 | ẃ | Latin | LATIN SMALL LETTER W WITH ACUTE | [100] | Recommended | Not in documented common use |
U+1E84 | Ẅ | Latin | LATIN CAPITAL LETTER W WITH DIAERESIS | [100] | Uppercase | |
U+1E85 | ẅ | Latin | LATIN SMALL LETTER W WITH DIAERESIS | [100] | Recommended | Not in documented common use |
U+1E86 | Ẇ | Latin | LATIN CAPITAL LETTER W WITH DOT ABOVE | [100] | Uppercase | |
U+1E87 | ẇ | Latin | LATIN SMALL LETTER W WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E88 | Ẉ | Latin | LATIN CAPITAL LETTER W WITH DOT BELOW | [100] | Uppercase | |
U+1E89 | ẉ | Latin | LATIN SMALL LETTER W WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+1E8A | Ẋ | Latin | LATIN CAPITAL LETTER X WITH DOT ABOVE | [100] | Uppercase | |
U+1E8B | ẋ | Latin | LATIN SMALL LETTER X WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E8E | Ẏ | Latin | LATIN CAPITAL LETTER Y WITH DOT ABOVE | [100] | Uppercase | |
U+1E8F | ẏ | Latin | LATIN SMALL LETTER Y WITH DOT ABOVE | [100] | Recommended | Not in documented common use |
U+1E90 | Ẑ | Latin | LATIN CAPITAL LETTER Z WITH CIRCUMFLEX | [100] | Uppercase | |
U+1E91 | ẑ | Latin | LATIN SMALL LETTER Z WITH CIRCUMFLEX | [100] | Recommended | Not in documented common use |
U+1E92 | Ẓ | Latin | LATIN CAPITAL LETTER Z WITH DOT BELOW | [100] | Uppercase | |
U+1E93 | ẓ | Latin | LATIN SMALL LETTER Z WITH DOT BELOW | [100] | Recommended | Not in documented common use |
U+1E94 | Ẕ | Latin | LATIN CAPITAL LETTER Z WITH LINE BELOW | [100] | Uppercase | |
U+1E95 | ẕ | Latin | LATIN SMALL LETTER Z WITH LINE BELOW | [100] | Recommended | Not in documented common use |
U+1E96 | ẖ | Latin | LATIN SMALL LETTER H WITH LINE BELOW | [100] | Recommended | Not in documented common use |
U+1E97 | ẗ | Latin | LATIN SMALL LETTER T WITH DIAERESIS | [100] | Recommended | Not in documented common use |
U+1E98 | ẘ | Latin | LATIN SMALL LETTER W WITH RING ABOVE | [100] | Recommended | Not in documented common use |
U+1E99 | ẙ | Latin | LATIN SMALL LETTER Y WITH RING ABOVE | [100] | Recommended | Not in documented common use |
U+2D80 | ⶀ | Ethiopic | ETHIOPIC SYLLABLE LOA | [100] | Recommended | Not in documented common use |
U+2D81 | ⶁ | Ethiopic | ETHIOPIC SYLLABLE MOA | [100] | Recommended | Not in documented common use |
U+2D82 | ⶂ | Ethiopic | ETHIOPIC SYLLABLE ROA | [100] | Recommended | Not in documented common use |
U+2D83 | ⶃ | Ethiopic | ETHIOPIC SYLLABLE SOA | [100] | Recommended | Not in documented common use |
U+2D84 | ⶄ | Ethiopic | ETHIOPIC SYLLABLE SHOA | [100] | Recommended | Not in documented common use |
U+2D85 | ⶅ | Ethiopic | ETHIOPIC SYLLABLE BOA | [100] | Recommended | Not in documented common use |
U+2D86 | ⶆ | Ethiopic | ETHIOPIC SYLLABLE TOA | [100] | Recommended | Not in documented common use |
U+2D87 | ⶇ | Ethiopic | ETHIOPIC SYLLABLE COA | [100] | Recommended | Not in documented common use |
U+2D88 | ⶈ | Ethiopic | ETHIOPIC SYLLABLE NOA | [100] | Recommended | Not in documented common use |
U+2D89 | ⶉ | Ethiopic | ETHIOPIC SYLLABLE NYOA | [100] | Recommended | Not in documented common use |
U+2D8A | ⶊ | Ethiopic | ETHIOPIC SYLLABLE GLOTTAL OA | [100] | Recommended | Not in documented common use |
U+2D8B | ⶋ | Ethiopic | ETHIOPIC SYLLABLE ZOA | [100] | Recommended | Not in documented common use |
U+2D8C | ⶌ | Ethiopic | ETHIOPIC SYLLABLE DOA | [100] | Recommended | Not in documented common use |
U+2D8D | ⶍ | Ethiopic | ETHIOPIC SYLLABLE DDOA | [100] | Recommended | Not in documented common use |
U+2D8E | ⶎ | Ethiopic | ETHIOPIC SYLLABLE JOA | [100] | Recommended | Not in documented common use |
U+2D8F | ⶏ | Ethiopic | ETHIOPIC SYLLABLE THOA | [100] | Recommended | Not in documented common use |
U+2D90 | ⶐ | Ethiopic | ETHIOPIC SYLLABLE CHOA | [100] | Recommended | Not in documented common use |
U+2D91 | ⶑ | Ethiopic | ETHIOPIC SYLLABLE PHOA | [100] | Recommended | Not in documented common use |
U+2D92 | ⶒ | Ethiopic | ETHIOPIC SYLLABLE POA | [100] | Recommended | Not in documented common use |
U+2D93 | ⶓ | Ethiopic | ETHIOPIC SYLLABLE GGWA | [100] | Recommended | Not in documented common use |
U+2D94 | ⶔ | Ethiopic | ETHIOPIC SYLLABLE GGWI | [100] | Recommended | Not in documented common use |
U+2D95 | ⶕ | Ethiopic | ETHIOPIC SYLLABLE GGWEE | [100] | Recommended | Not in documented common use |
U+2D96 | ⶖ | Ethiopic | ETHIOPIC SYLLABLE GGWE | [100] | Recommended | Not in documented common use |
U+A7B9 | ꞹ | Latin | LATIN SMALL LETTER U WITH STROKE | [100] | Recommended | Not in documented common use |
U+AB01 | ꬁ | Ethiopic | ETHIOPIC SYLLABLE TTHU | [100] | Recommended | Not in documented common use |
U+AB02 | ꬂ | Ethiopic | ETHIOPIC SYLLABLE TTHI | [100] | Recommended | Not in documented common use |
U+AB03 | ꬃ | Ethiopic | ETHIOPIC SYLLABLE TTHAA | [100] | Recommended | Not in documented common use |
U+AB04 | ꬄ | Ethiopic | ETHIOPIC SYLLABLE TTHEE | [100] | Recommended | Not in documented common use |
U+AB05 | ꬅ | Ethiopic | ETHIOPIC SYLLABLE TTHE | [100] | Recommended | Not in documented common use |
U+AB06 | ꬆ | Ethiopic | ETHIOPIC SYLLABLE TTHO | [100] | Recommended | Not in documented common use |
U+AB09 | ꬉ | Ethiopic | ETHIOPIC SYLLABLE DDHU | [100] | Recommended | Not in documented common use |
U+AB0A | ꬊ | Ethiopic | ETHIOPIC SYLLABLE DDHI | [100] | Recommended | Not in documented common use |
U+AB0B | ꬋ | Ethiopic | ETHIOPIC SYLLABLE DDHAA | [100] | Recommended | Not in documented common use |
U+AB0C | ꬌ | Ethiopic | ETHIOPIC SYLLABLE DDHEE | [100] | Recommended | Not in documented common use |
U+AB0D | ꬍ | Ethiopic | ETHIOPIC SYLLABLE DDHE | [100] | Recommended | Not in documented common use |
U+AB0E | ꬎ | Ethiopic | ETHIOPIC SYLLABLE DDHO | [100] | Recommended | Not in documented common use |
- Code Point
- A code point or code point sequence.
- Glyph
- The shape displayed depends on the fonts available to your browser.
- Script
- Shows the script property value from the Unicode Character Database. Combining marks may have the value Inherited and code points used with more than one script may have the value Common.
- Name
- Shows the character or sequence name from the Unicode Character Database.
- Ref
- Links to the references associated with the code point or sequence, if any.
- Tags
- LGR-defined tag values. Any tags matching the Unicode script property are suppressed in this view.
- Comment
- The comment as given in the XML file. However, if the comment for this row consists only of the code point or sequence name, it is suppressed in this view. By convention, comments starting with “=” denote an alias. If present, the symbol ⍟ marks a default item shared among a set of LGRs.
Variants
This LGR does not specify any variants.
Classes, Rules and Actions
Character Classes
Number of named classes | 2 |
---|---|
Implicit (except script) | 4 |
The following table lists all named and implicit classes with their definition and a list of their members intersected with the current repertoire (for larger classes, this list is elided).
Name | Definition | Count | Members or Ranges | Ref | Comment |
---|---|---|---|---|---|
Digits | Prop=gc:Nd | 760→70 | {0A66-0A6F 0B66-0B6F 0BE6-0BEF 0C66-0C6F 0D66-0D6F 0DE6-0DEF 1090-1099} | Any character matching Unicode property General_Category:Decimal_Number | |
Uppercase | Prop=gc:Lu | 1858→88 | {0114 012C 014E 0156 0162 01D5 01D7 01D9 01DB 01DE 01E0 01E2 01EA 01EC 01F4 01F8 01FA 01FC 01FE 021E 0226 0228 022A 022C 022E 0230 0232 0400 040D 04C1 04CB ...} | Any character matching Unicode property General_Category:Uppercase_Letter | |
implicit | Tag=Recommended | 344 | {0115 012D 014F 0157 0163 01D6 01D8 01DA 01DC 01DF 01E1 01E3 01EB 01ED 01F0 01F5 01F9 01FB 01FD 01FF 021F 0227 0229 022B 022D 022F 0231 0233 0450 045D 04C2 ...} | Any character tagged as Recommended | |
implicit | Tag=RefLGR | 2332→0 | {} | Any character tagged as RefLGR | |
implicit | Tag=RefLGRBySequence | 13→0 | {} | Any character tagged as RefLGRBySequence | |
implicit | Tag=Uppercase | 88 | {0114 012C 014E 0156 0162 01D5 01D7 01D9 01DB 01DE 01E0 01E2 01EA 01EC 01F4 01F8 01FA 01FC 01FE 021E 0226 0228 022A 022C 022E 0230 0232 0400 040D 04C1 04CB ...} | Any character tagged as Uppercase |
- Members or Ranges
- Lists the members of the class as code points (xxx) or as ranges of code points (xxx-yyy). Any class too numerous to list in full is elided with "...".
- m→n
- Indicates a set for which only n of its m members fall inside the repertoire.
- Tag=ttt
- A named or implicit class defined by all code points that share the given tag value (ttt).
- Prop=ppp:vvv
- A named class defined by reference to value vvv of Unicode property ppp.
- Implicit
- An anonymous class implicitly defined based on tag value and for which there is no named equivalent.
Note: The following named classes are defined but not used in this LGR: Digits, Uppercase.
Whole label evaluation and context rules
The LGR does not define any rules.
Actions
The LGR does not define any actions.
Table of References
The following lists the references cited for specific code points, variants, classes, rules or actions in this LGR.
[EGIDS] | Lewis and Simons, EGIDS: Expanded Graded Intergenerational Disruption Scale,” documented in [SIL-Ethnologue] and summarized here: https://en.wikipedia.org/wiki/Expanded_Graded_Intergenerational_Disruption_Scale_(EGIDS) |
[IAB] | IAB Statement on Identifiers and Unicode 7.0.0, https://datatracker.ietf.org/doc/statement-iab-statement-on-identifiers-and-unicode-7-0-0/01/pdf/ |
[MSR] | ICANN, “Maximal Starting Repertoire”, https://www.icann.org/resources/pages/msr-2015-06-21-en |
[Proposal-Arabic] | “Proposal for Arabic Script Root Zone LGR”, https://www.icann.org/en/system/files/files/arabic-lgr-proposal-18nov15-en.pdf |
[RefLGR] | ICANN, “Second-Level Reference Label Generation Rules”, https://www.icann.org/resources/pages/second-level-lgr-2015-06-21-en |
[RefLGR-Overview] | ICANN, “Reference Label Generation Rules (LGR) for the Second Level — Overview and Summary”, https://www.icann.org/sites/default/files/packages/lgr/lgr-second-level-overview-summary-25oct24-en.pdf |
[RZ-LGR] | ICANN, “Root Zone Label Generation Rules”, https://www.icann.org/resources/pages/root-zone-lgr-2015-06-21-en |
[SIL-Ethnologue] | David M. Eberhard, Gary F. Simons & Charles D. Fennig (eds.). 2021. Ethnologue: Languages of the World, Twenty fourth edition. Dallas, Texas: SIL International. Online version available as https://www.ethnologue.com |
[100] | The Unicode Consortium: Identifier_Type property for Unicode Version 16.0.0, available as https://unicode.org/Public/security/16.0.0/IdentifierType.txt |