IDN Character Categorization

$Date: 2005/03/30 17:19:32 $, MED

This page lists all of the valid output IDN characters broken down by category. By "output" IDN characters, we mean ones that can result from nameprep. Characters are grouped first by script, and then by subcategory. Within each subcategory characters are sorted according to the default UCA order. Tool-tips provide the character code and name (in enabled browsers).

IDN-Remapped-Case-Decomposable
Key
Subcategory Description
Atomic Characters that don't fall into any of the following subcategories
Atomic-no-uppercase For bicameral scripts, Atomic characters without an uppercase.
Pattern_Syntax Characters recommended as a basis for use in pattern syntax.

Excludes the word characters in Section 4 Word Boundaries of UAX# 29, in the Word_Break property and notes at the end of the section.

See UAX #31: Identifier and Pattern Syntax.

Non-XID Characters not recommended as a basis for identifiers, excluding Pattern_Syntax and the word characters in Section 4 Word Boundaries of UAX# 29, in the Word_Break property and notes at the end of the section.

See UAX #31: Identifier and Pattern Syntax (XID_Continue).

Decomposable Characters with NFC decompositions.
IDN-Remapped Characters remapped by IDN due to case folding
IDN-Remapped Characters remapped by IDN due to case folding, that are decomposable.
IDN-Remapped Characters remapped by IDN due to compatibility mapping.
IDN-Deleted Characters deleted by IDN.
IDN-Illegal Characters illegal in IDN (note: most of these are due to IDN's using an old version of Unicode).

The information in the categorization is also available in a plain-text file, at idn-chars.txt. It can be viewed as is, or loaded into a spreadsheet for sorting and filtering to view the data in different ways. The format is:

code ; script ; subcategory # general-category (character) character-name

Examples:

0061          ; LATIN ; Atomic # ; L& (a) LATIN SMALL LETTER A
2015          ; COMMON ; Pattern_Syntax # Pd (―) HORIZONTAL BAR
058A          ; ARMENIAN ; Atomic-no-uppercase # ; Pd (֊) ARMENIAN HYPHEN
20AC          ; COMMON ; Non-XID # ; Sc (€) EURO SIGN

Categorization

Script: LATIN
Atomic (80)
a æ b ɓ ƃ c ƈ d đ ɖ ɗ ƌ ð e ǝ ə ɛ f ƒ g ǥ ɠ ɣ ƣ h ƕ ħ i ı ɨ ɩ j k ƙ l ł ƚ m n ɲ ƞ ŋ o œ ø ɔ ɵ ȣ p ƥ q r ʀ s ʃ t ŧ ƭ ʈ u ɯ ʊ v ʋ w x y ƴ z ƶ ȥ ʒ ƹ ȝ þ ƿ ƨ ƽ ƅ ʔ
Atomic-no-uppercase (87)
ɐ ɑ ɒ ʙ ƀ ɕ ʣ ʥ ʤ ɘ ɚ ɜ ɝ ɞ ʚ ɤ ʩ ɡ ɢ ʛ ʜ ɦ ɧ ɪ ʝ ɟ ʄ ʞ ʪ ʫ ʟ ɫ ɬ ɭ ɮ ƛ ʎ ɱ ɴ ɳ ɶ ɷ ɸ ʠ ĸ ɹ ɺ ɻ ɼ ɽ ɾ ɿ ʁ ʂ ƪ ʅ ʆ ʨ ƾ ʦ ʧ ƫ ʇ ʉ ɥ ɰ ʌ ʍ ʏ ƍ ʐ ʑ ƺ ʓ ƻ ʕ ʡ ʢ ʖ ǀ ǁ ǂ ǃ ʗ ʘ ʬ ʭ
Decomposable (250)
á à ă â ǎ å ǻ ä ǟ ã ȧ ǡ ą ā ȁ ȃ ǽ ǣ ć ĉ č ċ ç ď é è ĕ ê ế ě ë ė ȩ ę ē ȅ ȇ ǵ ğ ĝ ǧ ġ ģ ĥ ȟ í ì ĭ î ǐ ï ĩ į ī ȉ ȋ ĵ ǰ ǩ ķ ĺ ľ ļ ḿ ń ǹ ň ñ ņ ó ò ŏ ô ǒ ö ȫ ő õ ȭ ȯ ȱ ǫ ǭ ō ȍ ȏ ơ ǿ ŕ ř ŗ ȑ ȓ ś ŝ š ş ș ť ţ ț ú ù ŭ û ǔ ů ü ǘ ǜ ǚ ǖ ű ũ ų ū ȕ ȗ ư ṿ ŵ ý ŷ ÿ ȳ ź ž ż ǯ
IDN-Remapped-Case-Atomic (78)
A Æ B Ɓ Ƃ C Ƈ D Đ Ɖ Ɗ Ƌ Ð E Ǝ Ə Ɛ F Ƒ G Ǥ Ɠ Ɣ Ƣ H Ƕ Ħ I Ɨ Ɩ J K Ƙ L Ł M N Ɲ Ƞ Ŋ O Œ Ø Ɔ Ɵ Ȣ P Ƥ Q R Ʀ S ß Ʃ T Ŧ Ƭ Ʈ U Ɯ Ʊ V Ʋ W X Y Ƴ Z Ƶ Ȥ Ʒ Ƹ Ȝ Þ Ƿ Ƨ Ƽ Ƅ
IDN-Remapped-Case-Decomposable (246)
Á À Ă Â Ǎ Å Ǻ Ä Ǟ Ã Ȧ Ǡ Ą Ā Ȁ Ȃ Ǽ Ǣ Ć Ĉ Č Ċ Ç Ď É È Ĕ Ê Ě Ë Ė Ȩ Ę Ē Ȅ Ȇ Ǵ Ğ Ĝ Ǧ Ġ Ģ Ĥ Ȟ Í Ì Ĭ Î Ǐ Ï Ĩ İ Į Ī Ȉ Ȋ Ĵ Ǩ Ķ Ĺ Ľ Ļ Ń Ǹ Ň Ñ Ņ Ó Ò Ŏ Ô Ǒ Ö Ȫ Ő Õ Ȭ Ȯ Ȱ Ǫ Ǭ Ō Ȍ Ȏ Ơ Ǿ Ŕ Ř Ŗ Ȑ Ȓ Ś Ŝ Š Ş Ș Ť Ţ Ț Ú Ù Ŭ Û Ǔ Ů Ü Ǘ Ǜ Ǚ Ǖ Ű Ũ Ų Ū Ȕ Ȗ Ư Ŵ Ý Ŷ Ÿ Ȳ Ź Ž Ż Ǯ
IDN-Remapped-Compat (99)
ª dz Dz DZ dž Dž DŽ ˠ ʰ ʱ ij IJ ʲ ˡ ŀ Ŀ lj Lj LJ nj Nj NJ º ʳ ʴ ʵ ʶ ˢ ſ ʷ ˣ ʸ ʼn ˤ
IDN-Illegal (197)
ȡ ȴ ȵ ᴿ ȶ ʮ ʯ ȷ ȸ ȹ Ⱥ Ȼ ȼ Ƚ Ⱦ ȿ ɀ Ɂ ᵿ ᶿ
Script: GREEK
Atomic (29)
α β γ δ ε ϝ ϛ ζ η θ ι κ λ μ ν ξ ο π ϟ ϙ ρ σ τ υ φ χ ψ ω ϡ
Atomic-no-uppercase (2)
ϳ ϗ
Non-XID (2)
͵ ϶
Decomposable (87)
ά έ ή ί ϊ ΐ ό ύ ϋ ΰ ώ
IDN-Remapped-Case-Atomic (30)
Α Β Γ Δ Ε Ϝ Ϛ Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ϟ Ϙ Ρ Σ ς Τ Υ Φ Χ Ψ Ω Ϡ
IDN-Remapped-Case-Decomposable (134)
Ά Έ Ή Ἷ Ί Ϊ Ό Ύ Ϋ Ώ
IDN-Remapped-Compat (31)
ʹ ϐ ϵ ϑ ϴ ϰ ϖ ϱ ϲ ϒ ϓ ϔ ϕ
IDN-Illegal (187)
΄ ΅ ᾿ ͺ Ϲ ϸ Ϸ ϻ Ϻ ϼ Ͻ Ͼ Ͽ 𐅀 𐅁 𐅂 𐅃 𐅄 𐅅 𐅆 𐅇 𐅈 𐅉 𐅊 𐅋 𐅌 𐅍 𐅎 𐅏 𐅐 𐅑 𐅒 𐅓 𐅔 𐅕 𐅖 𐅗 𐅘 𐅙 𐅚 𐅛 𐅜 𐅝 𐅞 𐅟 𐅠 𐅡 𐅢 𐅣 𐅤 𐅥 𐅦 𐅧 𐅨 𐅩 𐅪 𐅫 𐅬 𐅭 𐅮 𐅯 𐅰 𐅱 𐅲 𐅳 𐅴 𐅵 𐅶 𐅷 𐅸 𐅹 𐅺 𐅻 𐅼 𐅽 𐅾 𐅿 𐆀 𐆁 𐆂 𐆃 𐆄 𐆅 𐆆 𐆇 𐆈 𐆉 𐆊 𝈀 𝈁 𝈂 𝈃 𝈄 𝈅 𝈆 𝈇 𝈈 𝈉 𝈊 𝈋 𝈌 𝈍 𝈎 𝈏 𝈐 𝈑 𝈒 𝈓 𝈔 𝈕 𝈖 𝈗 𝈘 𝈙 𝈚 𝈛 𝈜 𝈝 𝈞 𝈟 𝈠 𝈡 𝈢 𝈣 𝈤 𝈥 𝈦 𝈧 𝈨 𝈩 𝈪 𝈫 𝈬 𝈭 𝈮 𝈯 𝈰 𝈱 𝈲 𝈳 𝈴 𝈵 𝈶 𝈷 𝈸 𝈹 𝈺 𝈻 𝈼 𝈽 𝈾 𝈿 𝉀 𝉁  𝉂  𝉃  𝉄 𝉅
Script: CYRILLIC
Atomic (102)
а ә ӕ б в г ґ ғ ҕ д ԁ ђ ԃ ҙ е є ж җ з ԅ ѕ ӡ ԇ и ҋ і ј к қ ӄ ҡ ҟ ҝ л ӆ љ ԉ м ӎ н ӊ ң ӈ ҥ њ ԋ о ө п ҧ ҁ р ҏ с ԍ ҫ т ԏ ҭ ћ у ү ұ ѹ ф х ҳ һ ѡ ѿ ѽ ѻ ц ҵ ч ҷ ӌ ҹ ҽ ҿ џ ш щ ъ ы ь ҍ ѣ э ю я ѥ ѧ ѫ ѩ ѭ ѯ ѱ ѳ ѵ ҩ Ӏ
Atomic-no-uppercase (4)
 ҃  ҄  ҅  ҆
Non-XID (3)
 ҈  ҉ ҂
Decomposable (26)
ӑ ӓ ӛ ѓ ѐ ё ӗ ӂ ӝ ӟ ѝ ӣ ӥ ї й ӧ ӫ ќ ӯ ў ӱ ӳ ӵ ӹ ӭ ѷ
IDN-Remapped-Case-Atomic (101)
А Ә Ӕ Б В Г Ґ Ғ Ҕ Д Ԁ Ђ Ԃ Ҙ Е Є Ж Җ З Ԅ Ѕ Ӡ Ԇ И Ҋ І Ј К Қ Ӄ Ҡ Ҟ Ҝ Л Ӆ Љ Ԉ М Ӎ Н Ӊ Ң Ӈ Ҥ Њ Ԋ О Ө П Ҧ Ҁ Р Ҏ С Ԍ Ҫ Т Ԏ Ҭ Ћ У Ү Ұ Ѹ Ф Х Ҳ Һ Ѡ Ѿ Ѽ Ѻ Ц Ҵ Ч Ҷ Ӌ Ҹ Ҽ Ҿ Џ Ш Щ Ъ Ы Ь Ҍ Ѣ Э Ю Я Ѥ Ѧ Ѫ Ѩ Ѭ Ѯ Ѱ Ѳ Ѵ Ҩ
IDN-Remapped-Case-Decomposable (26)
Ӑ Ӓ Ӛ Ѓ Ѐ Ё Ӗ Ӂ Ӝ Ӟ Ѝ Ӣ Ӥ Ї Й Ӧ Ӫ Ќ Ӯ Ў Ӱ Ӳ Ӵ Ӹ Ӭ Ѷ
IDN-Illegal (4)
Ӷ ӷ
Script: ARMENIAN
Atomic (39)
֊ ա բ գ դ ե զ է ը թ ժ ի լ խ ծ կ հ ձ ղ ճ մ յ ն շ ո չ պ ջ ռ ս վ տ ր ց ւ փ