L2/09-219R2
Subject: Operational Properties for Action 115A008Date: 2009-05-14 From: Mark Davis To: UTC I had the following action from the UTC: 115 A008 Mark Davis Produce updated proposal for the "operationally X-cased" properties, with more background. L2/08-157 2008-05-20 2008-05-20 Here is the proposal, after revising the names and comments as per discussion in the UTC on May 13. (Link to working doc: http://www.macchiato.com/unicode/action-115a008) DerivedCoreProperties.txtAdd the following 6 properties (the short name is in parens).# Derived Property: Cased (Cased) # As defined by Unicode Standard Definition D120 # C has the Lowercase or Uppercase property or has a General_Category value of Titlecase_Letter. 0041..005A ; Cased # L& [26] LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER Z 0061..007A ; Cased # L& [26] LATIN SMALL LETTER A..LATIN SMALL LETTER Z 00AA ; Cased # L& FEMININE ORDINAL INDICATOR 00B5 ; Cased # L& MICRO SIGN 00BA ; Cased # L& MASCULINE ORDINAL INDICATOR 00C0..00D6 ; Cased # L& [23] LATIN CAPITAL LETTER A WITH GRAVE..LATIN CAPITAL LETTER O WITH DIAERESIS ... # Derived Property: Case_Ignorable (CI) # As defined by Unicode Standard Definition D121 # C is defined to be case-ignorable if # Word_Break(C) = MidLetter or MidNumLet, or # General_Category(C) = Nonspacing_Mark (Mn), Enclosing_Mark (Me), Format (Cf), Modifier_Letter (Lm), or Modifier_Symbol (Sk). 0027 ; Case_Ignorable # Po APOSTROPHE 002E ; Case_Ignorable # Po FULL STOP 003A ; Case_Ignorable # Po COLON 005E ; Case_Ignorable # Sk CIRCUMFLEX ACCENT 0060 ; Case_Ignorable # Sk GRAVE ACCENT 00A8 ; Case_Ignorable # Sk DIAERESIS .... # Derived Property: Is_Lowercase (ILC) # As defined by Unicode Standard Definition D124 # isLowercase(X) is true when toLowercase(toNFD(X)) = toNFD(X) 0000..001F ; Is_Lowercase # Cc [32] <control-0000>..<control-001F> 0020 ; Is_Lowercase # Zs SPACE 0021..0023 ; Is_Lowercase # Po [3] EXCLAMATION MARK..NUMBER SIGN 0024 ; Is_Lowercase # Sc DOLLAR SIGN 0025..0027 ; Is_Lowercase # Po [3] PERCENT SIGN..APOSTROPHE 0028 ; Is_Lowercase # Ps LEFT PARENTHESIS 0029 ; Is_Lowercase # Pe RIGHT PARENTHESIS 002A ; Is_Lowercase # Po ASTERISK 002B ; Is_Lowercase # Sm PLUS SIGN ... # Derived Property: Is_Uppercase (IUC) # As defined by Unicode Standard Definition D125 # isUppercase(X) is true when toUppercase(toNFD(X)) = toNFD(X) 0000..001F ; Is_Uppercase # Cc [32] <control-0000>..<control-001F> 0020 ; Is_Uppercase # Zs SPACE 0021..0023 ; Is_Uppercase # Po [3] EXCLAMATION MARK..NUMBER SIGN 0024 ; Is_Uppercase # Sc DOLLAR SIGN 0025..0027 ; Is_Uppercase # Po [3] PERCENT SIGN..APOSTROPHE 0028 ; Is_Uppercase # Ps LEFT PARENTHESIS 0029 ; Is_Uppercase # Pe RIGHT PARENTHESIS ... # Derived Property: Is_Titlecase (ITC) # As defined by Unicode Standard Definition D126 # isTitlecase(X) is true when toTitlecase(toNFD(X)) = toNFD(X) 0000..001F ; Is_Titlecase # Cc [32] <control-0000>..<control-001F> 0020 ; Is_Titlecase # Zs SPACE 0021..0023 ; Is_Titlecase # Po [3] EXCLAMATION MARK..NUMBER SIGN 0024 ; Is_Titlecase # Sc DOLLAR SIGN 0025..0027 ; Is_Titlecase # Po [3] PERCENT SIGN..APOSTROPHE ... # Derived Property: Is_Casefolded (ICF) # As defined by Unicode Standard Definition D127 # isCasefolded(X) is true when toCasefold(toNFD(X)) = toNFD(X) 0000..001F ; Is_Casefolded # Cc [32] <control-0000>..<control-001F> 0020 ; Is_Casefolded # Zs SPACE 0021..0023 ; Is_Casefolded # Po [3] EXCLAMATION MARK..NUMBER SIGN 0024 ; Is_Casefolded # Sc DOLLAR SIGN 0025..0027 ; Is_Casefolded # Po [3] PERCENT SIGN..APOSTROPHE 0028 ; Is_Casefolded # Ps LEFT PARENTHESIS 0029 ; Is_Casefolded # Pe RIGHT PARENTHESIS 002A ; Is_Casefolded # Po ASTERISK 002B ; Is_Casefolded # Sm PLUS SIGN ... # Derived Property: Is_Cased (IC) # As defined by Unicode Standard Definition D128 # isCased(X) when isLowercase(X) is false, or isUppercase(X) is false, or isTitlecase(X) is false 0041..005A ; Is_Cased # L& [26] LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER Z 0061..007A ; Is_Cased # L& [26] LATIN SMALL LETTER A..LATIN SMALL LETTER Z 00B5 ; Is_Cased # L& MICRO SIGN 00C0..00D6 ; Is_Cased # L& [23] LATIN CAPITAL LETTER A WITH GRAVE..LATIN CAPITAL LETTER O WITH DIAERESIS 00D8..00F6 ; Is_Cased # L& [31] LATIN CAPITAL LETTER O WITH STROKE..LATIN SMALL LETTER O WITH DIAERESIS 00F8..0137 ; Is_Cased # L& [64] LATIN SMALL LETTER O WITH STROKE..LATIN SMALL LETTER K WITH CEDILLA 0139..018C ; Is_Cased # L& [84] LATIN CAPITAL LETTER L WITH ACUTE..LATIN SMALL LETTER D WITH TOPBAR ... DerivedNormalizationProperties.txtAdd the following 2 properties:# Derived Property: NFKC_Casefold (CCIF) # As defined by CaseFolding, removing Default_Ignorable_Code_Points, then transforming by NFKC; then repeating # All code points not explicitly listed for NFKC_Casefold # have a value equal to the code point. 0041 ; NFKC_Casefold; 0061 # L& LATIN CAPITAL LETTER A 0042 ; NFKC_Casefold; 0062 # L& LATIN CAPITAL LETTER B 0043 ; NFKC_Casefold; 0063 # L& LATIN CAPITAL LETTER C 0044 ; NFKC_Casefold; 0064 # L& LATIN CAPITAL LETTER D 0045 ; NFKC_Casefold; 0065 # L& LATIN CAPITAL LETTER E 0046 ; NFKC_Casefold; 0066 # L& LATIN CAPITAL LETTER F 0047 ; NFKC_Casefold; 0067 # L& LATIN CAPITAL LETTER G ... 005A ; NFKC_Casefold; 007A # L& LATIN CAPITAL LETTER Z 00A0 ; NFKC_Casefold; 0020 # Zs NO-BREAK SPACE 00A8 ; NFKC_Casefold; 0020 0308 # Sk DIAERESIS 00AA ; NFKC_Casefold; 0061 # L& FEMININE ORDINAL INDICATOR 00AD ; NFKC_Casefold; # Cf SOFT HYPHEN 00AF ; NFKC_Casefold; 0020 0304 # Sk MACRON 00B2 ; NFKC_Casefold; 0032 # No SUPERSCRIPT TWO 00B3 ; NFKC_Casefold; 0033 # No SUPERSCRIPT THREE 00B4 ; NFKC_Casefold; 0020 0301 # Sk ACUTE ACCENT ... # Derived Property: Is_NFKC_Casefold (isCCIF) # As defined by cp = NFKC_Casefold(cp) 0000..001F ; Is_NFKC_Casefold # Cc [32] <control-0000>..<control-001F> 0020 ; Is_NFKC_Casefold # Zs SPACE 0021..0023 ; Is_NFKC_Casefold # Po [3] EXCLAMATION MARK..NUMBER SIGN 0024 ; Is_NFKC_Casefold # Sc DOLLAR SIGN 0025..0027 ; Is_NFKC_Casefold # Po [3] PERCENT SIGN..APOSTROPHE 0028 ; Is_NFKC_Casefold # Ps LEFT PARENTHESIS 0029 ; Is_NFKC_Casefold # Pe RIGHT PARENTHESIS 002A ; Is_NFKC_Casefold # Po ASTERISK ... TextAdd references to these properties under the corresponding definitions, plus in UAX #31. |