L2/06-242
Subject: Distinguishing Sk, Lm, Mc From: Mark Davis Date: 2007-07-25 We have two kinds of general category property values called modifiers: Sk (Symbol, Modifier) and Lm (Letter, Modifier). We also have Mc (Mark, Spacing Combining) which are spacing marks that modify letters. In addition, we have characters with "MODIFIER LETTER" in the name.
Unfortunately, there is no alignment between these, and people get confused. Of the characters with MODIFIER LETTER in the name, 63 are Sk, and 134 are Lm, while of those without MODIFIER LETTER in there names, there are 36 Sk and 33 gc=Lm. At least we got one thing right: no Mc characters have "MODIFIER LETTER" in their names! In the discussion of a UTC document (02-267), the relevant distinction was given by Ken as that the Lm's were used as parts of words and identifiers, while the Sk's were not. Although that reason was not captured in the minutes, it was in my notes and in a modification history note in one of the UAXs.
We should:
- Document the reasons that we distinguish Sk, Lm, Mc in UCD.html for U5+ (the next version after 5.0) so that people understand what we mean by one versus the other, and when they should use one versus the other.
- Review the assignments listed at the end of this document to ensure that we follow the descriptions in our assignements
- Document that many MODIFIER LETTERs are not actually modifier letters (in places where we document that names are misleading)
Documents
- http://www.unicode.org/L2
/L2002/02267r3-prop-fixes.html - http://www.unicode.org/consorti
um/utc-minutes/UTC-092-200208 .html - [92-C27] Consensus: Change the general category of 02B9..02BA, 02C6..02CF from Sk to Lm for Unicode 4.0. See section 1.D.a of L2/02-267R3.
Breakdown
In [$gc:Sk], but not in [$Name:«.*MODIFIER LETTER.*»] :
005E # Sk (^) CIRCUMFLEX ACCENT 0060 # Sk (`) GRAVE ACCENT 00A8 # Sk (¨) DIAERESIS 00AF # Sk (¯) MACRON 00B4 # Sk (´) ACUTE ACCENT 00B8 # Sk (¸) CEDILLA 02D8 # Sk (˘) BREVE 02D9 # Sk (˙) DOT ABOVE 02DA # Sk (˚) RING ABOVE 02DB # Sk (˛) OGONEK 02DC # Sk (˜) SMALL TILDE 02DD # Sk (˝) DOUBLE ACUTE ACCENT 0374 # Sk (ʹ) GREEK NUMERAL SIGN 0375 # Sk (͵) GREEK LOWER NUMERAL SIGN 0384 # Sk (΄) GREEK TONOS 0385 # Sk (΅) GREEK DIALYTIKA TONOS 1FBD # Sk (᾽) GREEK KORONIS 1FBF # Sk (᾿) GREEK PSILI 1FC0 # Sk (῀) GREEK PERISPOMENI 1FC1 # Sk (῁) GREEK DIALYTIKA AND PERISPOMENI 1FCD # Sk (῍) GREEK PSILI AND VARIA 1FCE # Sk (῎) GREEK PSILI AND OXIA 1FCF # Sk (῏) GREEK PSILI AND PERISPOMENI 1FDD # Sk (῝) GREEK DASIA AND VARIA 1FDE # Sk (῞) GREEK DASIA AND OXIA 1FDF # Sk (῟) GREEK DASIA AND PERISPOMENI 1FED # Sk (῭) GREEK DIALYTIKA AND VARIA 1FEE # Sk (΅) GREEK DIALYTIKA AND OXIA 1FEF # Sk (`) GREEK VARIA 1FFD # Sk (´) GREEK OXIA 1FFE # Sk (῾) GREEK DASIA 309B # Sk (゛) KATAKANA-HIRAGANA VOICED SOUND MARK 309C # Sk (゜) KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK FF3E # Sk (^) FULLWIDTH CIRCUMFLEX ACCENT FF40 # Sk (`) FULLWIDTH GRAVE ACCENT FFE3 # Sk ( ̄) FULLWIDTH MACRON # Total code points: 36
In both [$gc:Sk], and in [$Name:«.*MODIFIER LETTER.*»] :
02C2 # Sk (˂) MODIFIER LETTER LEFT ARROWHEAD 02C3 # Sk (˃) MODIFIER LETTER RIGHT ARROWHEAD 02C4 # Sk (˄) MODIFIER LETTER UP ARROWHEAD 02C5 # Sk (˅) MODIFIER LETTER DOWN ARROWHEAD 02D2 # Sk (˒) MODIFIER LETTER CENTRED RIGHT HALF RING 02D3 # Sk (˓) MODIFIER LETTER CENTRED LEFT HALF RING 02D4 # Sk (˔) MODIFIER LETTER UP TACK 02D5 # Sk (˕) MODIFIER LETTER DOWN TACK 02D6 # Sk (˖) MODIFIER LETTER PLUS SIGN 02D7 # Sk (˗) MODIFIER LETTER MINUS SIGN 02DE # Sk (˞) MODIFIER LETTER RHOTIC HOOK 02DF # Sk (˟) MODIFIER LETTER CROSS ACCENT 02E5 # Sk (˥) MODIFIER LETTER EXTRA-HIGH TONE BAR 02E6 # Sk (˦) MODIFIER LETTER HIGH TONE BAR 02E7 # Sk (˧) MODIFIER LETTER MID TONE BAR 02E8 # Sk (˨) MODIFIER LETTER LOW TONE BAR 02E9 # Sk (˩) MODIFIER LETTER EXTRA-LOW TONE BAR 02EA # Sk (˪) MODIFIER LETTER YIN DEPARTING TONE MARK 02EB # Sk (˫) MODIFIER LETTER YANG DEPARTING TONE MARK 02EC # Sk (ˬ) MODIFIER LETTER VOICING 02ED # Sk (˭) MODIFIER LETTER UNASPIRATED 02EF # Sk (˯) MODIFIER LETTER LOW DOWN ARROWHEAD 02F0 # Sk (˰) MODIFIER LETTER LOW UP ARROWHEAD 02F1 # Sk (˱) MODIFIER LETTER LOW LEFT ARROWHEAD 02F2 # Sk (˲) MODIFIER LETTER LOW RIGHT ARROWHEAD 02F3 # Sk (˳) MODIFIER LETTER LOW RING 02F4 # Sk (˴) MODIFIER LETTER MIDDLE GRAVE ACCENT 02F5 # Sk (˵) MODIFIER LETTER MIDDLE DOUBLE GRAVE ACCENT 02F6 # Sk (˶) MODIFIER LETTER MIDDLE DOUBLE ACUTE ACCENT 02F7 # Sk (˷) MODIFIER LETTER LOW TILDE 02F8 # Sk (˸) MODIFIER LETTER RAISED COLON 02F9 # Sk (˹) MODIFIER LETTER BEGIN HIGH TONE 02FA # Sk (˺) MODIFIER LETTER END HIGH TONE 02FB # Sk (˻) MODIFIER LETTER BEGIN LOW TONE 02FC # Sk (˼) MODIFIER LETTER END LOW TONE 02FD # Sk (˽) MODIFIER LETTER SHELF 02FE # Sk (˾) MODIFIER LETTER OPEN SHELF 02FF # Sk (˿) MODIFIER LETTER LOW LEFT ARROW A700 # Sk (꜀) MODIFIER LETTER CHINESE TONE YIN PING A701 # Sk (꜁) MODIFIER LETTER CHINESE TONE YANG PING A702 # Sk (꜂) MODIFIER LETTER CHINESE TONE YIN SHANG A703 # Sk (꜃) MODIFIER LETTER CHINESE TONE YANG SHANG A704 # Sk (꜄) MODIFIER LETTER CHINESE TONE YIN QU A705 # Sk (꜅) MODIFIER LETTER CHINESE TONE YANG QU A706 # Sk (꜆) MODIFIER LETTER CHINESE TONE YIN RU A707 # Sk (꜇) MODIFIER LETTER CHINESE TONE YANG RU A708 # Sk (꜈) MODIFIER LETTER EXTRA-HIGH DOTTED TONE BAR A709 # Sk (꜉) MODIFIER LETTER HIGH DOTTED TONE BAR A70A # Sk (꜊) MODIFIER LETTER MID DOTTED TONE BAR A70B # Sk (꜋) MODIFIER LETTER LOW DOTTED TONE BAR A70C # Sk (꜌) MODIFIER LETTER EXTRA-LOW DOTTED TONE BAR A70D # Sk (꜍) MODIFIER LETTER EXTRA-HIGH DOTTED LEFT-STEM TONE BAR A70E # Sk (꜎) MODIFIER LETTER HIGH DOTTED LEFT-STEM TONE BAR A70F # Sk (꜏) MODIFIER LETTER MID DOTTED LEFT-STEM TONE BAR A710 # Sk (꜐) MODIFIER LETTER LOW DOTTED LEFT-STEM TONE BAR A711 # Sk (꜑) MODIFIER LETTER EXTRA-LOW DOTTED LEFT-STEM TONE BAR A712 # Sk (꜒) MODIFIER LETTER EXTRA-HIGH LEFT-STEM TONE BAR A713 # Sk (꜓) MODIFIER LETTER HIGH LEFT-STEM TONE BAR A714 # Sk (꜔) MODIFIER LETTER MID LEFT-STEM TONE BAR A715 # Sk (꜕) MODIFIER LETTER LOW LEFT-STEM TONE BAR A716 # Sk (꜖) MODIFIER LETTER EXTRA-LOW LEFT-STEM TONE BAR A720 # Sk (꜠) MODIFIER LETTER STRESS AND HIGH TONE A721 # Sk (꜡) MODIFIER LETTER STRESS AND LOW TONE # Total code points: 63
In [$gc:Lm], but not in [$Name:«.*MODIFIER LETTER.*»] :
02C7 # Lm (ˇ) CARON 037A # Lm (ͺ) GREEK YPOGEGRAMMENI 0640 # Lm (ـ) ARABIC TATWEEL 06E5 # Lm (ۥ) ARABIC SMALL WAW 06E6 # Lm (ۦ) ARABIC SMALL YEH 07F4 # Lm (ߴ) NKO HIGH TONE APOSTROPHE 07F5 # Lm (ߵ) NKO LOW TONE APOSTROPHE 07FA # Lm (ߺ) NKO LAJANYALAN 0E46 # Lm (ๆ) THAI CHARACTER MAIYAMOK 0EC6 # Lm (ໆ) LAO KO LA 17D7 # Lm (ៗ) KHMER SIGN LEK TOO 1843 # Lm (ᡃ) MONGOLIAN LETTER TODO LONG VOWEL SIGN 2090 # Lm (ₐ) LATIN SUBSCRIPT SMALL LETTER A 2091 # Lm (ₑ) LATIN SUBSCRIPT SMALL LETTER E 2092 # Lm (ₒ) LATIN SUBSCRIPT SMALL LETTER O 2093 # Lm (ₓ) LATIN SUBSCRIPT SMALL LETTER X 2094 # Lm (ₔ) LATIN SUBSCRIPT SMALL LETTER SCHWA 3005 # Lm (々) IDEOGRAPHIC ITERATION MARK 3031 # Lm (〱) VERTICAL KANA REPEAT MARK 3032 # Lm (〲) VERTICAL KANA REPEAT WITH VOICED SOUND MARK 3033 # Lm (〳) VERTICAL KANA REPEAT MARK UPPER HALF 3034 # Lm (〴) VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF 3035 # Lm (〵) VERTICAL KANA REPEAT MARK LOWER HALF 303B # Lm (〻) VERTICAL IDEOGRAPHIC ITERATION MARK 309D # Lm (ゝ) HIRAGANA ITERATION MARK 309E # Lm (ゞ) HIRAGANA VOICED ITERATION MARK 30FC # Lm (ー) KATAKANA-HIRAGANA PROLONGED SOUND MARK 30FD # Lm (ヽ) KATAKANA ITERATION MARK 30FE # Lm (ヾ) KATAKANA VOICED ITERATION MARK A015 # Lm (ꀕ) YI SYLLABLE WU FF70 # Lm (ー) HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK FF9E # Lm (゙) HALFWIDTH KATAKANA VOICED SOUND MARK FF9F # Lm (゚) HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK # Total code points: 33
In both [$gc:Lm], and in [$Name:«.*MODIFIER LETTER.*»] :
02B0 # Lm (ʰ) MODIFIER LETTER SMALL H 02B1 # Lm (ʱ) MODIFIER LETTER SMALL H WITH HOOK 02B2 # Lm (ʲ) MODIFIER LETTER SMALL J 02B3 # Lm (ʳ) MODIFIER LETTER SMALL R 02B4 # Lm (ʴ) MODIFIER LETTER SMALL TURNED R 02B5 # Lm (ʵ) MODIFIER LETTER SMALL TURNED R WITH HOOK 02B6 # Lm (ʶ) MODIFIER LETTER SMALL CAPITAL INVERTED R 02B7 # Lm (ʷ) MODIFIER LETTER SMALL W 02B8 # Lm (ʸ) MODIFIER LETTER SMALL Y 02B9 # Lm (ʹ) MODIFIER LETTER PRIME 02BA # Lm (ʺ) MODIFIER LETTER DOUBLE PRIME 02BB # Lm (ʻ) MODIFIER LETTER TURNED COMMA 02BC # Lm (ʼ) MODIFIER LETTER APOSTROPHE 02BD # Lm (ʽ) MODIFIER LETTER REVERSED COMMA 02BE # Lm (ʾ) MODIFIER LETTER RIGHT HALF RING 02BF # Lm (ʿ) MODIFIER LETTER LEFT HALF RING 02C0 # Lm (ˀ) MODIFIER LETTER GLOTTAL STOP 02C1 # Lm (ˁ) MODIFIER LETTER REVERSED GLOTTAL STOP 02C6 # Lm (ˆ) MODIFIER LETTER CIRCUMFLEX ACCENT 02C8 # Lm (ˈ) MODIFIER LETTER VERTICAL LINE 02C9 # Lm (ˉ) MODIFIER LETTER MACRON 02CA # Lm (ˊ) MODIFIER LETTER ACUTE ACCENT 02CB # Lm (ˋ) MODIFIER LETTER GRAVE ACCENT 02CC # Lm (ˌ) MODIFIER LETTER LOW VERTICAL LINE 02CD # Lm (ˍ) MODIFIER LETTER LOW MACRON 02CE # Lm (ˎ) MODIFIER LETTER LOW GRAVE ACCENT 02CF # Lm (ˏ) MODIFIER LETTER LOW ACUTE ACCENT 02D0 # Lm (ː) MODIFIER LETTER TRIANGULAR COLON 02D1 # Lm (ˑ) MODIFIER LETTER HALF TRIANGULAR COLON 02E0 # Lm (ˠ) MODIFIER LETTER SMALL GAMMA 02E1 # Lm (ˡ) MODIFIER LETTER SMALL L 02E2 # Lm (ˢ) MODIFIER LETTER SMALL S 02E3 # Lm (ˣ) MODIFIER LETTER SMALL X 02E4 # Lm (ˤ) MODIFIER LETTER SMALL REVERSED GLOTTAL STOP 02EE # Lm (ˮ) MODIFIER LETTER DOUBLE APOSTROPHE 0559 # Lm (ՙ) ARMENIAN MODIFIER LETTER LEFT HALF RING 10FC # Lm (ჼ) MODIFIER LETTER GEORGIAN NAR 1D2C # Lm (ᴬ) MODIFIER LETTER CAPITAL A 1D2D # Lm (ᴭ) MODIFIER LETTER CAPITAL AE 1D2E # Lm (ᴮ) MODIFIER LETTER CAPITAL B 1D2F # Lm (ᴯ) MODIFIER LETTER CAPITAL BARRED B 1D30 # Lm (ᴰ) MODIFIER LETTER CAPITAL D 1D31 # Lm (ᴱ) MODIFIER LETTER CAPITAL E 1D32 # Lm (ᴲ) MODIFIER LETTER CAPITAL REVERSED E 1D33 # Lm (ᴳ) MODIFIER LETTER CAPITAL G 1D34 # Lm (ᴴ) MODIFIER LETTER CAPITAL H 1D35 # Lm (ᴵ) MODIFIER LETTER CAPITAL I 1D36 # Lm (ᴶ) MODIFIER LETTER CAPITAL J 1D37 # Lm (ᴷ) MODIFIER LETTER CAPITAL K 1D38 # Lm (ᴸ) MODIFIER LETTER CAPITAL L 1D39 # Lm (ᴹ) MODIFIER LETTER CAPITAL M 1D3A # Lm (ᴺ) MODIFIER LETTER CAPITAL N 1D3B # Lm (ᴻ) MODIFIER LETTER CAPITAL REVERSED N 1D3C # Lm (ᴼ) MODIFIER LETTER CAPITAL O 1D3D # Lm (ᴽ) MODIFIER LETTER CAPITAL OU 1D3E # Lm (ᴾ) MODIFIER LETTER CAPITAL P 1D3F # Lm (ᴿ) MODIFIER LETTER CAPITAL R 1D40 # Lm (ᵀ) MODIFIER LETTER CAPITAL T 1D41 # Lm (ᵁ) MODIFIER LETTER CAPITAL U 1D42 # Lm (ᵂ) MODIFIER LETTER CAPITAL W 1D43 # Lm (ᵃ) MODIFIER LETTER SMALL A 1D44 # Lm (ᵄ) MODIFIER LETTER SMALL TURNED A 1D45 # Lm (ᵅ) MODIFIER LETTER SMALL ALPHA 1D46 # Lm (ᵆ) MODIFIER LETTER SMALL TURNED AE 1D47 # Lm (ᵇ) MODIFIER LETTER SMALL B 1D48 # Lm (ᵈ) MODIFIER LETTER SMALL D 1D49 # Lm (ᵉ) MODIFIER LETTER SMALL E 1D4A # Lm (ᵊ) MODIFIER LETTER SMALL SCHWA 1D4B # Lm (ᵋ) MODIFIER LETTER SMALL OPEN E 1D4C # Lm (ᵌ) MODIFIER LETTER SMALL TURNED OPEN E 1D4D # Lm (ᵍ) MODIFIER LETTER SMALL G 1D4E # Lm (ᵎ) MODIFIER LETTER SMALL TURNED I 1D4F # Lm (ᵏ) MODIFIER LETTER SMALL K 1D50 # Lm (ᵐ) MODIFIER LETTER SMALL M 1D51 # Lm (ᵑ) MODIFIER LETTER SMALL ENG 1D52 # Lm (ᵒ) MODIFIER LETTER SMALL O 1D53 # Lm (ᵓ) MODIFIER LETTER SMALL OPEN O 1D54 # Lm (ᵔ) MODIFIER LETTER SMALL TOP HALF O 1D55 # Lm (ᵕ) MODIFIER LETTER SMALL BOTTOM HALF O 1D56 # Lm (ᵖ) MODIFIER LETTER SMALL P 1D57 # Lm (ᵗ) MODIFIER LETTER SMALL T 1D58 # Lm (ᵘ) MODIFIER LETTER SMALL U 1D59 # Lm (ᵙ) MODIFIER LETTER SMALL SIDEWAYS U 1D5A # Lm (ᵚ) MODIFIER LETTER SMALL TURNED M 1D5B # Lm (ᵛ) MODIFIER LETTER SMALL V 1D5C # Lm (ᵜ) MODIFIER LETTER SMALL AIN 1D5D # Lm (ᵝ) MODIFIER LETTER SMALL BETA 1D5E # Lm (ᵞ) MODIFIER LETTER SMALL GREEK GAMMA 1D5F # Lm (ᵟ) MODIFIER LETTER SMALL DELTA 1D60 # Lm (ᵠ) MODIFIER LETTER SMALL GREEK PHI 1D61 # Lm (ᵡ) MODIFIER LETTER SMALL CHI 1D78 # Lm (ᵸ) MODIFIER LETTER CYRILLIC EN 1D9B # Lm (ᶛ) MODIFIER LETTER SMALL TURNED ALPHA 1D9C # Lm (ᶜ) MODIFIER LETTER SMALL C 1D9D # Lm (ᶝ) MODIFIER LETTER SMALL C WITH CURL 1D9E # Lm (ᶞ) MODIFIER LETTER SMALL ETH 1D9F # Lm (ᶟ) MODIFIER LETTER SMALL REVERSED OPEN E 1DA0 # Lm (ᶠ) MODIFIER LETTER SMALL F 1DA1 # Lm (ᶡ) MODIFIER LETTER SMALL DOTLESS J WITH STROKE 1DA2 # Lm (ᶢ) MODIFIER LETTER SMALL SCRIPT G 1DA3 # Lm (ᶣ) MODIFIER LETTER SMALL TURNED H 1DA4 # Lm (ᶤ) MODIFIER LETTER SMALL I WITH STROKE 1DA5 # Lm (ᶥ) MODIFIER LETTER SMALL IOTA 1DA6 # Lm (ᶦ) MODIFIER LETTER SMALL CAPITAL I 1DA7 # Lm (ᶧ) MODIFIER LETTER SMALL CAPITAL I WITH STROKE 1DA8 # Lm (ᶨ) MODIFIER LETTER SMALL J WITH CROSSED-TAIL 1DA9 # Lm (ᶩ) MODIFIER LETTER SMALL L WITH RETROFLEX HOOK 1DAA # Lm (ᶪ) MODIFIER LETTER SMALL L WITH PALATAL HOOK 1DAB # Lm (ᶫ) MODIFIER LETTER SMALL CAPITAL L 1DAC # Lm (ᶬ) MODIFIER LETTER SMALL M WITH HOOK 1DAD # Lm (ᶭ) MODIFIER LETTER SMALL TURNED M WITH LONG LEG 1DAE # Lm (ᶮ) MODIFIER LETTER SMALL N WITH LEFT HOOK 1DAF # Lm (ᶯ) MODIFIER LETTER SMALL N WITH RETROFLEX HOOK 1DB0 # Lm (ᶰ) MODIFIER LETTER SMALL CAPITAL N 1DB1 # Lm (ᶱ) MODIFIER LETTER SMALL BARRED O 1DB2 # Lm (ᶲ) MODIFIER LETTER SMALL PHI 1DB3 # Lm (ᶳ) MODIFIER LETTER SMALL S WITH HOOK 1DB4 # Lm (ᶴ) MODIFIER LETTER SMALL ESH 1DB5 # Lm (ᶵ) MODIFIER LETTER SMALL T WITH PALATAL HOOK 1DB6 # Lm (ᶶ) MODIFIER LETTER SMALL U BAR 1DB7 # Lm (ᶷ) MODIFIER LETTER SMALL UPSILON 1DB8 # Lm (ᶸ) MODIFIER LETTER SMALL CAPITAL U 1DB9 # Lm (ᶹ) MODIFIER LETTER SMALL V WITH HOOK 1DBA # Lm (ᶺ) MODIFIER LETTER SMALL TURNED V 1DBB # Lm (ᶻ) MODIFIER LETTER SMALL Z 1DBC # Lm (ᶼ) MODIFIER LETTER SMALL Z WITH RETROFLEX HOOK 1DBD # Lm (ᶽ) MODIFIER LETTER SMALL Z WITH CURL 1DBE # Lm (ᶾ) MODIFIER LETTER SMALL EZH 1DBF # Lm (ᶿ) MODIFIER LETTER SMALL THETA 2D6F # Lm (ⵯ) TIFINAGH MODIFIER LETTER LABIALIZATION MARK A717 # Lm (ꜗ) MODIFIER LETTER DOT VERTICAL BAR A718 # Lm (ꜘ) MODIFIER LETTER DOT SLASH A719 # Lm (ꜙ) MODIFIER LETTER DOT HORIZONTAL BAR A71A # Lm (ꜚ) MODIFIER LETTER LOWER RIGHT CORNER ANGLE # Total code points: 134