We discussed in the last UTC meeting the issue of stability under case folding, especially in regards to caseless programming language identifiers and similar formats or processing (such as StringPrep). The issue is this: if we change the case folding behavior of assigned characters, that could cause a problem for implementations / specifications that need to maintain backwards compatibility. While these problems can be dealt with by the implementations / specifications, it would clearly simplify matters for them to be able to depend on stability.
The case foldings that are in Unicode have been reviewed extensively, so from that aspect there should be no problem in our adding a stability policy guaranteeing that they do not change. The open issue would be to examine characters that do not currently have case foldings, but could conceivably need to, if the other half of a case pair is added. Because the case folding is normally the toLowercase() value of a character, we can focus on only those characters that are either Uppercase or Titlecase. Because the Unicode recommendation for caseless identifiers recommends using NFKC (which Stringprep also follows), we only need focus on those characters that are in NFKC.
There are only a small number of such characters, the six below:
U+023A LATIN CAPITAL LETTER A WITH STROKE new in 4.1 U+023E LATIN CAPITAL LETTER T WITH DIAGONAL STROKE new in 4.1 U+03FD GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL new in 4.1 U+03FE GREEK CAPITAL DOTTED LUNATE SIGMA SYMBOL new in 4.1 U+03FF GREEK CAPITAL REVERSED DOTTED LUNATE SIGMA SYMBOL new in 4.1 U+04C0 CYRILLIC LETTER PALOCHKA already in 4.0 # Total code points: 6
Once we could resolve any issues in the above, and set in place a careful review for future cases, we could put in place a stability policy such as the following, for Unicode versions later than some version X.
D1. For all strings S containing characters only from Unicode Versions A and B
Of the above characters, here is a preliminary assessment:
U+023A LATIN CAPITAL LETTER A WITH STROKE new in 4.1 U+023E LATIN CAPITAL LETTER T WITH DIAGONAL STROKE new in 4.1
For these 2 characters, we should add corresponding lowercase characters ASAP, because there is a good chance that we will need them in the future.
U+03FD GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL new in 4.1 U+03FE GREEK CAPITAL DOTTED LUNATE SIGMA SYMBOL new in 4.1 U+03FF GREEK CAPITAL REVERSED DOTTED LUNATE SIGMA SYMBOL new in 4.1 U+04C0 CYRILLIC LETTER PALOCHKA already in 4.0
These 4 characters are and will remain caseless characters; we should change the general category to Lo to reflect that.
If this assessment is agreed to, then we could offer a slightly weaker stability guarantee during the interim period before we can add the two letters:
D1'. For all strings S containing characters only from Unicode Versions A and B (and excluding U+023A and U+023E)
B. For comparison, here are Uppercase or Titlecase characters which are not in NFKC, and don't have a casefolding.
03D2..03D4 # L& [3] GREEK UPSILON WITH HOOK SYMBOL..GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL 2102 # L& DOUBLE-STRUCK CAPITAL C 2107 # L& EULER CONSTANT 210B..210D # L& [3] SCRIPT CAPITAL H..DOUBLE-STRUCK CAPITAL H 2110..2112 # L& [3] SCRIPT CAPITAL I..SCRIPT CAPITAL L 2115 # L& DOUBLE-STRUCK CAPITAL N 2119..211D # L& [5] DOUBLE-STRUCK CAPITAL P..DOUBLE-STRUCK CAPITAL R 2124 # L& DOUBLE-STRUCK CAPITAL Z 2128 # L& BLACK-LETTER CAPITAL Z 212C..212D # L& [2] SCRIPT CAPITAL B..BLACK-LETTER CAPITAL C 2130..2131 # L& [2] SCRIPT CAPITAL E..SCRIPT CAPITAL F 2133 # L& SCRIPT CAPITAL M 213E..213F # L& [2] DOUBLE-STRUCK CAPITAL GAMMA..DOUBLE-STRUCK CAPITAL PI 2145 # L& DOUBLE-STRUCK ITALIC CAPITAL D 1D400..1D419 # L& [26] MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL BOLD CAPITAL Z 1D434..1D44D # L& [26] MATHEMATICAL ITALIC CAPITAL A..MATHEMATICAL ITALIC CAPITAL Z 1D468..1D481 # L& [26] MATHEMATICAL BOLD ITALIC CAPITAL A..MATHEMATICAL BOLD ITALIC CAPITAL Z 1D49C # L& MATHEMATICAL SCRIPT CAPITAL A 1D49E..1D49F # L& [2] MATHEMATICAL SCRIPT CAPITAL C..MATHEMATICAL SCRIPT CAPITAL D 1D4A2 # L& MATHEMATICAL SCRIPT CAPITAL G 1D4A5..1D4A6 # L& [2] MATHEMATICAL SCRIPT CAPITAL J..MATHEMATICAL SCRIPT CAPITAL K 1D4A9..1D4AC # L& [4] MATHEMATICAL SCRIPT CAPITAL N..MATHEMATICAL SCRIPT CAPITAL Q 1D4AE..1D4B5 # L& [8] MATHEMATICAL SCRIPT CAPITAL S..MATHEMATICAL SCRIPT CAPITAL Z 1D4D0..1D4E9 # L& [26] MATHEMATICAL BOLD SCRIPT CAPITAL A..MATHEMATICAL BOLD SCRIPT CAPITAL Z 1D504..1D505 # L& [2] MATHEMATICAL FRAKTUR CAPITAL A..MATHEMATICAL FRAKTUR CAPITAL B 1D507..1D50A # L& [4] MATHEMATICAL FRAKTUR CAPITAL D..MATHEMATICAL FRAKTUR CAPITAL G 1D50D..1D514 # L& [8] MATHEMATICAL FRAKTUR CAPITAL J..MATHEMATICAL FRAKTUR CAPITAL Q 1D516..1D51C # L& [7] MATHEMATICAL FRAKTUR CAPITAL S..MATHEMATICAL FRAKTUR CAPITAL Y 1D538..1D539 # L& [2] MATHEMATICAL DOUBLE-STRUCK CAPITAL A..MATHEMATICAL DOUBLE-STRUCK CAPITAL B 1D53B..1D53E # L& [4] MATHEMATICAL DOUBLE-STRUCK CAPITAL D..MATHEMATICAL DOUBLE-STRUCK CAPITAL G 1D540..1D544 # L& [5] MATHEMATICAL DOUBLE-STRUCK CAPITAL I..MATHEMATICAL DOUBLE-STRUCK CAPITAL M 1D546 # L& MATHEMATICAL DOUBLE-STRUCK CAPITAL O 1D54A..1D550 # L& [7] MATHEMATICAL DOUBLE-STRUCK CAPITAL S..MATHEMATICAL DOUBLE-STRUCK CAPITAL Y 1D56C..1D585 # L& [26] MATHEMATICAL BOLD FRAKTUR CAPITAL A..MATHEMATICAL BOLD FRAKTUR CAPITAL Z 1D5A0..1D5B9 # L& [26] MATHEMATICAL SANS-SERIF CAPITAL A..MATHEMATICAL SANS-SERIF CAPITAL Z 1D5D4..1D5ED # L& [26] MATHEMATICAL SANS-SERIF BOLD CAPITAL A..MATHEMATICAL SANS-SERIF BOLD CAPITAL Z 1D608..1D621 # L& [26] MATHEMATICAL SANS-SERIF ITALIC CAPITAL A..MATHEMATICAL SANS-SERIF ITALIC CAPITAL Z 1D63C..1D655 # L& [26] MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL A..MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL Z 1D670..1D689 # L& [26] MATHEMATICAL MONOSPACE CAPITAL A..MATHEMATICAL MONOSPACE CAPITAL Z 1D6A8..1D6C0 # L& [25] MATHEMATICAL BOLD CAPITAL ALPHA..MATHEMATICAL BOLD CAPITAL OMEGA 1D6E2..1D6FA # L& [25] MATHEMATICAL ITALIC CAPITAL ALPHA..MATHEMATICAL ITALIC CAPITAL OMEGA 1D71C..1D734 # L& [25] MATHEMATICAL BOLD ITALIC CAPITAL ALPHA..MATHEMATICAL BOLD ITALIC CAPITAL OMEGA 1D756..1D76E # L& [25] MATHEMATICAL SANS-SERIF BOLD CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD CAPITAL OMEGA 1D790..1D7A8 # L& [25] MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL OMEGA # Total code points: 470
C. For comparison, here are the remaining Uppercase/Titlecase characters, the ones that do have a case folding. To make the list shorter, a four-dot elipsis represents every other code points between the two values.
0041..005A # L& [26] LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER Z 00C0..00D6 # L& [23] LATIN CAPITAL LETTER A WITH GRAVE..LATIN CAPITAL LETTER O WITH DIAERESIS 00D8..00DE # L& [7] LATIN CAPITAL LETTER O WITH STROKE..LATIN CAPITAL LETTER THORN 0100 # L& LATIN CAPITAL LETTER A WITH MACRON .... 0136 # L& LATIN CAPITAL LETTER K WITH CEDILLA 0139 # L& LATIN CAPITAL LETTER L WITH ACUTE .... 0147 # L& LATIN CAPITAL LETTER N WITH CARON 014A # L& LATIN CAPITAL LETTER ENG .... 0176 # L& LATIN CAPITAL LETTER Y WITH CIRCUMFLEX 0178..0179 # L& [2] LATIN CAPITAL LETTER Y WITH DIAERESIS..LATIN CAPITAL LETTER Z WITH ACUTE 017B # L& LATIN CAPITAL LETTER Z WITH DOT ABOVE 017D # L& LATIN CAPITAL LETTER Z WITH CARON 0181..0182 # L& [2] LATIN CAPITAL LETTER B WITH HOOK..LATIN CAPITAL LETTER B WITH TOPBAR 0184 # L& LATIN CAPITAL LETTER TONE SIX 0186..0187 # L& [2] LATIN CAPITAL LETTER OPEN O..LATIN CAPITAL LETTER C WITH HOOK 0189..018B # L& [3] LATIN CAPITAL LETTER AFRICAN D..LATIN CAPITAL LETTER D WITH TOPBAR 018E..0191 # L& [4] LATIN CAPITAL LETTER REVERSED E..LATIN CAPITAL LETTER F WITH HOOK 0193..0194 # L& [2] LATIN CAPITAL LETTER G WITH HOOK..LATIN CAPITAL LETTER GAMMA 0196..0198 # L& [3] LATIN CAPITAL LETTER IOTA..LATIN CAPITAL LETTER K WITH HOOK 019C..019D # L& [2] LATIN CAPITAL LETTER TURNED M..LATIN CAPITAL LETTER N WITH LEFT HOOK 019F..01A0 # L& [2] LATIN CAPITAL LETTER O WITH MIDDLE TILDE..LATIN CAPITAL LETTER O WITH HORN 01A2 # L& LATIN CAPITAL LETTER OI 01A4 # L& LATIN CAPITAL LETTER P WITH HOOK 01A6..01A7 # L& [2] LATIN LETTER YR..LATIN CAPITAL LETTER TONE TWO 01A9 # L& LATIN CAPITAL LETTER ESH 01AC # L& LATIN CAPITAL LETTER T WITH HOOK 01AE..01AF # L& [2] LATIN CAPITAL LETTER T WITH RETROFLEX HOOK..LATIN CAPITAL LETTER U WITH HORN 01B1..01B3 # L& [3] LATIN CAPITAL LETTER UPSILON..LATIN CAPITAL LETTER Y WITH HOOK 01B5 # L& LATIN CAPITAL LETTER Z WITH STROKE 01B7..01B8 # L& [2] LATIN CAPITAL LETTER EZH..LATIN CAPITAL LETTER EZH REVERSED 01BC # L& LATIN CAPITAL LETTER TONE FIVE 01C4..01C5 # L& [2] LATIN CAPITAL LETTER DZ WITH CARON..LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON 01C7..01C8 # L& [2] LATIN CAPITAL LETTER LJ..LATIN CAPITAL LETTER L WITH SMALL LETTER J 01CA..01CB # L& [2] LATIN CAPITAL LETTER NJ..LATIN CAPITAL LETTER N WITH SMALL LETTER J 01CD # L& LATIN CAPITAL LETTER A WITH CARON .... 01DB # L& LATIN CAPITAL LETTER U WITH DIAERESIS AND GRAVE 01DE # L& LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON .... 01EE # L& LATIN CAPITAL LETTER EZH WITH CARON 01F1..01F2 # L& [2] LATIN CAPITAL LETTER DZ..LATIN CAPITAL LETTER D WITH SMALL LETTER Z 01F4 # L& LATIN CAPITAL LETTER G WITH ACUTE 01F6..01F8 # L& [3] LATIN CAPITAL LETTER HWAIR..LATIN CAPITAL LETTER N WITH GRAVE 01FA # L& LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE .... 0232 # L& LATIN CAPITAL LETTER Y WITH MACRON 023B # L& LATIN CAPITAL LETTER C WITH STROKE 023D # L& LATIN CAPITAL LETTER L WITH BAR 0241 # L& LATIN CAPITAL LETTER GLOTTAL STOP 0386 # L& GREEK CAPITAL LETTER ALPHA WITH TONOS 0388..038A # L& [3] GREEK CAPITAL LETTER EPSILON WITH TONOS..GREEK CAPITAL LETTER IOTA WITH TONOS 038C # L& GREEK CAPITAL LETTER OMICRON WITH TONOS 038E..038F # L& [2] GREEK CAPITAL LETTER UPSILON WITH TONOS..GREEK CAPITAL LETTER OMEGA WITH TONOS 0391..03A1 # L& [17] GREEK CAPITAL LETTER ALPHA..GREEK CAPITAL LETTER RHO 03A3..03AB # L& [9] GREEK CAPITAL LETTER SIGMA..GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA 03D8 # L& GREEK LETTER ARCHAIC KOPPA .... 03EE # L& COPTIC CAPITAL LETTER DEI 03F4 # L& GREEK CAPITAL THETA SYMBOL 03F7 # L& GREEK CAPITAL LETTER SHO 03F9..03FA # L& [2] GREEK CAPITAL LUNATE SIGMA SYMBOL..GREEK CAPITAL LETTER SAN 0400..042F # L& [48] CYRILLIC CAPITAL LETTER IE WITH GRAVE..CYRILLIC CAPITAL LETTER YA 0460 # L& CYRILLIC CAPITAL LETTER OMEGA .... 0480 # L& CYRILLIC CAPITAL LETTER KOPPA 048A # L& CYRILLIC CAPITAL LETTER SHORT I WITH TAIL .... 04BE # L& CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER 04C1 # L& CYRILLIC CAPITAL LETTER ZHE WITH BREVE .... 04CD # L& CYRILLIC CAPITAL LETTER EM WITH TAIL 04D0 # L& CYRILLIC CAPITAL LETTER A WITH BREVE .... 04F8 # L& CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS 0500 # L& CYRILLIC CAPITAL LETTER KOMI DE .... 050E # L& CYRILLIC CAPITAL LETTER KOMI TJE 0531..0556 # L& [38] ARMENIAN CAPITAL LETTER AYB..ARMENIAN CAPITAL LETTER FEH 10A0..10C5 # L& [38] GEORGIAN CAPITAL LETTER AN..GEORGIAN CAPITAL LETTER HOE 1E00 # L& LATIN CAPITAL LETTER A WITH RING BELOW .... 1EF8 # L& LATIN CAPITAL LETTER Y WITH TILDE 1F08..1F0F # L& [8] GREEK CAPITAL LETTER ALPHA WITH PSILI..GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI 1F18..1F1D # L& [6] GREEK CAPITAL LETTER EPSILON WITH PSILI..GREEK CAPITAL LETTER EPSILON WITH DASIA AND OXIA 1F28..1F2F # L& [8] GREEK CAPITAL LETTER ETA WITH PSILI..GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI 1F38..1F3F # L& [8] GREEK CAPITAL LETTER IOTA WITH PSILI..GREEK CAPITAL LETTER IOTA WITH DASIA AND PERISPOMENI 1F48..1F4D # L& [6] GREEK CAPITAL LETTER OMICRON WITH PSILI..GREEK CAPITAL LETTER OMICRON WITH DASIA AND OXIA 1F59 # L& GREEK CAPITAL LETTER UPSILON WITH DASIA .... 1F5F # L& GREEK CAPITAL LETTER UPSILON WITH DASIA AND PERISPOMENI 1F68..1F6F # L& [8] GREEK CAPITAL LETTER OMEGA WITH PSILI..GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI 1F88..1F8F # L& [8] GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI..GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI 1F98..1F9F # L& [8] GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI..GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI 1FA8..1FAF # L& [8] GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI..GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI 1FB8..1FBC # L& [5] GREEK CAPITAL LETTER ALPHA WITH VRACHY..GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI 1FC8..1FCC # L& [5] GREEK CAPITAL LETTER EPSILON WITH VARIA..GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI 1FD8..1FDB # L& [4] GREEK CAPITAL LETTER IOTA WITH VRACHY..GREEK CAPITAL LETTER IOTA WITH OXIA 1FE8..1FEC # L& [5] GREEK CAPITAL LETTER UPSILON WITH VRACHY..GREEK CAPITAL LETTER RHO WITH DASIA 1FF8..1FFC # L& [5] GREEK CAPITAL LETTER OMICRON WITH VARIA..GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI 2126 # L& OHM SIGN 212A..212B # L& [2] KELVIN SIGN..ANGSTROM SIGN 2160..216F # Nl [16] ROMAN NUMERAL ONE..ROMAN NUMERAL ONE THOUSAND 24B6..24CF # So [26] CIRCLED LATIN CAPITAL LETTER A..CIRCLED LATIN CAPITAL LETTER Z 2C00..2C2E # L& [47] GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC CAPITAL LETTER LATINATE MYSLITE 2C80 # L& COPTIC CAPITAL LETTER ALFA .... 2CE2 # L& COPTIC CAPITAL LETTER OLD NUBIAN WAU FF21..FF3A # L& [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z 10400..10427 # L& [40] DESERET CAPITAL LETTER LONG I..DESERET CAPITAL LETTER EW # Total code points: 893