L2/09-219
Subject: Operational Properties for Action 115A008
Date: 2009-05-09
From: Mark Davis
To: UTC
I had the following action from the UTC:
115 A008 Mark Davis Produce updated proposal for
the "operationally X-cased" properties, with more background.
L2/08-157 2008-05-20 2008-05-20
Here is the proposal.
DerivedCoreProperties.txt
Add the following 6 properties (the short name is in parens).
# Derived
Property: Cased (Cased)
# As defined
by Unicode Standard Definition D120
# C has the
Lowercase or Uppercase property or has a General_Category value
of Titlecase_Letter.
0041..005A
; Cased # L& [26] LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER
Z
0061..007A
; Cased # L& [26] LATIN SMALL LETTER A..LATIN SMALL LETTER Z
00AA
; Cased # L& FEMININE ORDINAL INDICATOR
00B5
; Cased # L& MICRO SIGN
00BA
; Cased # L& MASCULINE ORDINAL INDICATOR
00C0..00D6
; Cased # L& [23] LATIN CAPITAL LETTER A WITH GRAVE..LATIN
CAPITAL LETTER O WITH DIAERESIS
...
# Derived
Property: Case_Ignoreable (CI)
# As defined
by Unicode Standard Definition D121
# C is
defined to be case-ignorable if
#
Word_Break(C) = MidLetter or MidNumLet, or
#
General_Category(C) = Nonspacing_Mark (Mn), Enclosing_Mark (Me),
Format (Cf), Modifier_Letter (Lm), or Modifier_Symbol (Sk).
0027
; Case_Ignoreable # Po APOSTROPHE
002E
; Case_Ignoreable # Po FULL STOP
003A
; Case_Ignoreable # Po COLON
005E
; Case_Ignoreable # Sk CIRCUMFLEX ACCENT
0060
; Case_Ignoreable # Sk GRAVE ACCENT
00A8
; Case_Ignoreable # Sk DIAERESIS
....
# Derived
Property: Operationally_Lowercased (OLC)
# As defined
by Unicode Standard Definition D124
# isLowercase(X)
is true when toLowercase(Y) = Y
0000..001F
; Operationally_Lowercased # Cc [32]
<control-0000>..<control-001F>
0020
; Operationally_Lowercased # Zs SPACE
0021..0023
; Operationally_Lowercased # Po [3] EXCLAMATION MARK..NUMBER
SIGN
0024
; Operationally_Lowercased # Sc DOLLAR SIGN
0025..0027
; Operationally_Lowercased # Po [3] PERCENT SIGN..APOSTROPHE
0028
; Operationally_Lowercased # Ps LEFT PARENTHESIS
0029
; Operationally_Lowercased # Pe RIGHT PARENTHESIS
002A
; Operationally_Lowercased # Po ASTERISK
002B
; Operationally_Lowercased # Sm PLUS SIGN
...
# Derived
Property: Operationally_Uppercased (OUC)
# As defined
by Unicode Standard Definition D125
# isUppercase(X)
is true when toUppercase(Y) = Y
0000..001F
; Operationally_Uppercased # Cc [32]
<control-0000>..<control-001F>
0020
; Operationally_Uppercased # Zs SPACE
0021..0023
; Operationally_Uppercased # Po [3] EXCLAMATION MARK..NUMBER
SIGN
0024
; Operationally_Uppercased # Sc DOLLAR SIGN
0025..0027
; Operationally_Uppercased # Po [3] PERCENT SIGN..APOSTROPHE
0028
; Operationally_Uppercased # Ps LEFT PARENTHESIS
0029
; Operationally_Uppercased # Pe RIGHT PARENTHESIS
...
# Derived
Property: Operationally_Titlecased (OTC)
# As defined
by Unicode Standard Definition D126
# isTitlecase(X)
is true when toTitlecase(Y) = Y
0000..001F
; Operationally_Titlecased # Cc [32]
<control-0000>..<control-001F>
0020
; Operationally_Titlecased # Zs SPACE
0021..0023
; Operationally_Titlecased # Po [3] EXCLAMATION MARK..NUMBER
SIGN
0024
; Operationally_Titlecased # Sc DOLLAR SIGN
0025..0027
; Operationally_Titlecased # Po [3] PERCENT SIGN..APOSTROPHE
...
# Derived
Property: Operationally_Casefolded (OCF)
# As defined
by Unicode Standard Definition D127
#
isCasefolded(X) is true when toCasefold(Y) = Y
0000..001F
; Operationally_Casefolded # Cc [32]
<control-0000>..<control-001F>
0020
; Operationally_Casefolded # Zs SPACE
0021..0023
; Operationally_Casefolded # Po [3] EXCLAMATION MARK..NUMBER
SIGN
0024
; Operationally_Casefolded # Sc DOLLAR SIGN
0025..0027
; Operationally_Casefolded # Po [3] PERCENT SIGN..APOSTROPHE
0028
; Operationally_Casefolded # Ps LEFT PARENTHESIS
0029
; Operationally_Casefolded # Pe RIGHT PARENTHESIS
002A
; Operationally_Casefolded # Po ASTERISK
002B
; Operationally_Casefolded # Sm PLUS SIGN
...
# Derived
Property: Operationally_Cased (OC)
# As defined
by Unicode Standard Definition D128
# isCased(X)
when isLowercase(X) is false, or isUppercase(X) is false, or
isTitlecase(X) is false
0041..005A
; Operationally_Cased # L& [26] LATIN CAPITAL LETTER A..LATIN
CAPITAL LETTER Z
0061..007A
; Operationally_Cased # L& [26] LATIN SMALL LETTER A..LATIN
SMALL LETTER Z
00B5
; Operationally_Cased # L& MICRO SIGN
00C0..00D6
; Operationally_Cased # L& [23] LATIN CAPITAL LETTER A WITH
GRAVE..LATIN CAPITAL LETTER O WITH DIAERESIS
00D8..00F6
; Operationally_Cased # L& [31] LATIN CAPITAL LETTER O WITH
STROKE..LATIN SMALL LETTER O WITH DIAERESIS
00F8..0137
; Operationally_Cased # L& [64] LATIN SMALL LETTER O WITH
STROKE..LATIN SMALL LETTER K WITH CEDILLA
0139..018C
; Operationally_Cased # L& [84] LATIN CAPITAL LETTER L WITH
ACUTE..LATIN SMALL LETTER D WITH TOPBAR
...
DerivedNormalizationProperties.txt
Add the following 2 properties:
# Derived
Property: CaseCompatIgnorableFold (CCIF)
# As defined
by CaseFolding, removing Default_Ignorable_Code_Points, then
transforming by NFKC; then repeating
# All code
points not explicitly listed for CaseCompatIgnorableFold
# have a
value equal to the code point.
0041 ;
CaseCompatIgnorableFold; 0061 # L& LATIN CAPITAL
LETTER A
0042 ;
CaseCompatIgnorableFold; 0062 # L& LATIN CAPITAL
LETTER B
0043 ;
CaseCompatIgnorableFold; 0063 # L& LATIN CAPITAL
LETTER C
0044 ;
CaseCompatIgnorableFold; 0064 # L& LATIN CAPITAL
LETTER D
0045 ;
CaseCompatIgnorableFold; 0065 # L& LATIN CAPITAL
LETTER E
0046 ;
CaseCompatIgnorableFold; 0066 # L& LATIN CAPITAL
LETTER F
0047 ;
CaseCompatIgnorableFold; 0067 # L& LATIN CAPITAL
LETTER G
...
005A ;
CaseCompatIgnorableFold; 007A # L& LATIN CAPITAL
LETTER Z
00A0 ;
CaseCompatIgnorableFold; 0020 # Zs NO-BREAK SPACE
00A8 ;
CaseCompatIgnorableFold; 0020 0308 # Sk DIAERESIS
00AA ;
CaseCompatIgnorableFold; 0061 # L& FEMININE ORDINAL
INDICATOR
00AD ;
CaseCompatIgnorableFold; # Cf SOFT HYPHEN
00AF ;
CaseCompatIgnorableFold; 0020 0304 # Sk MACRON
00B2 ;
CaseCompatIgnorableFold; 0032 # No SUPERSCRIPT TWO
00B3 ;
CaseCompatIgnorableFold; 0033 # No SUPERSCRIPT THREE
00B4 ;
CaseCompatIgnorableFold; 0020 0301 # Sk ACUTE ACCENT
...
# Derived
Property: CaseCompatIgnorableFolded (isCCIF)
# As defined
by cp = CaseCompatIgnorableFold(cp)
0000..001F
; CaseCompatIgnorableFolded # Cc [32]
<control-0000>..<control-001F>
0020
; CaseCompatIgnorableFolded # Zs SPACE
0021..0023
; CaseCompatIgnorableFolded # Po [3] EXCLAMATION MARK..NUMBER
SIGN
0024
; CaseCompatIgnorableFolded # Sc DOLLAR SIGN
0025..0027
; CaseCompatIgnorableFolded # Po [3] PERCENT SIGN..APOSTROPHE
0028
; CaseCompatIgnorableFolded # Ps LEFT PARENTHESIS
0029
; CaseCompatIgnorableFolded # Pe RIGHT PARENTHESIS
002A
; CaseCompatIgnorableFolded # Po ASTERISK
...
Text
Add references to these properties under the corresponding
definitions, plus in UAX #31.