From: Philippe VERDY (verdy_p@wanadoo.fr)
Date: Sun Apr 03 2005 - 15:57:24 CST
"Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl>
> I said *new* CJK compatibility ideographs. U+FA70..U+FAD9 were
> unassigned in earlier versions of Unicode.
I have just checked the new UCD, and you're right (but the previous message saying that NFC was changed in 4.1 was wrong, or severely misleading, and I was not his author).
So for reference, the new characters from the UCD are:
# Newly assigned in Unicode 4.1.0 (XXX, 2005)
0237..0241 ; 4.1 # [11] LATIN SMALL LETTER DOTLESS J..LATIN CAPITAL LETTER GLOTTAL STOP
0358..035C ; 4.1 # [5] COMBINING DOT ABOVE RIGHT..COMBINING DOUBLE BREVE BELOW
03FC..03FF ; 4.1 # [4] GREEK RHO WITH STROKE SYMBOL..GREEK CAPITAL REVERSED DOTTED LUNATE SIGMA SYMBOL
04F6..04F7 ; 4.1 # [2] CYRILLIC CAPITAL LETTER GHE WITH DESCENDER..CYRILLIC SMALL LETTER GHE WITH DESCENDER
05A2 ; 4.1 # HEBREW ACCENT ATNAH HAFUKH
05C5..05C7 ; 4.1 # [3] HEBREW MARK LOWER DOT..HEBREW POINT QAMATS QATAN
060B ; 4.1 # AFGHANI SIGN
061E ; 4.1 # ARABIC TRIPLE DOT PUNCTUATION MARK
0659..065E ; 4.1 # [6] ARABIC ZWARAKAY..ARABIC FATHA WITH TWO DOTS
0750..076D ; 4.1 # [30] ARABIC LETTER BEH WITH THREE DOTS HORIZONTALLY BELOW..ARABIC LETTER SEEN WITH TWO DOTS VERTICALLY ABOVE
097D ; 4.1 # DEVANAGARI LETTER GLOTTAL STOP
09CE ; 4.1 # BENGALI LETTER KHANDA TA
0BB6 ; 4.1 # TAMIL LETTER SHA
0BE6 ; 4.1 # TAMIL DIGIT ZERO
0FD0..0FD1 ; 4.1 # [2] TIBETAN MARK BSKA- SHOG GI MGO RGYAN..TIBETAN MARK MNYAM YIG GI MGO RGYAN
10F9..10FA ; 4.1 # [2] GEORGIAN LETTER TURNED GAN..GEORGIAN LETTER AIN
10FC ; 4.1 # MODIFIER LETTER GEORGIAN NAR
1207 ; 4.1 # ETHIOPIC SYLLABLE HOA
1247 ; 4.1 # ETHIOPIC SYLLABLE QOA
1287 ; 4.1 # ETHIOPIC SYLLABLE XOA
12AF ; 4.1 # ETHIOPIC SYLLABLE KOA
12CF ; 4.1 # ETHIOPIC SYLLABLE WOA
12EF ; 4.1 # ETHIOPIC SYLLABLE YOA
130F ; 4.1 # ETHIOPIC SYLLABLE GOA
131F ; 4.1 # ETHIOPIC SYLLABLE GGWAA
1347 ; 4.1 # ETHIOPIC SYLLABLE TZOA
135F..1360 ; 4.1 # [2] ETHIOPIC COMBINING GEMINATION MARK..ETHIOPIC SECTION MARK
1380..1399 ; 4.1 # [26] ETHIOPIC SYLLABLE SEBATBEIT MWA..ETHIOPIC TONAL MARK KURT
1980..19A9 ; 4.1 # [42] NEW TAI LUE LETTER HIGH QA..NEW TAI LUE LETTER LOW XVA
19B0..19C9 ; 4.1 # [26] NEW TAI LUE VOWEL SIGN VOWEL SHORTENER..NEW TAI LUE TONE MARK-2
19D0..19D9 ; 4.1 # [10] NEW TAI LUE DIGIT ZERO..NEW TAI LUE DIGIT NINE
19DE..19DF ; 4.1 # [2] NEW TAI LUE SIGN LAE..NEW TAI LUE SIGN LAEV
1A00..1A1B ; 4.1 # [28] BUGINESE LETTER KA..BUGINESE VOWEL SIGN AE
1A1E..1A1F ; 4.1 # [2] BUGINESE PALLAWA..BUGINESE END OF SECTION
1D6C..1DC3 ; 4.1 # [88] LATIN SMALL LETTER B WITH MIDDLE TILDE..COMBINING SUSPENSION MARK
2055..2056 ; 4.1 # [2] FLOWER PUNCTUATION MARK..THREE DOT PUNCTUATION
2058..205E ; 4.1 # [7] FOUR DOT PUNCTUATION..VERTICAL FOUR DOTS
2090..2094 ; 4.1 # [5] LATIN SUBSCRIPT SMALL LETTER A..LATIN SUBSCRIPT SMALL LETTER SCHWA
20B2..20B5 ; 4.1 # [4] GUARANI SIGN..CEDI SIGN
20EB ; 4.1 # COMBINING LONG DOUBLE SOLIDUS OVERLAY
213C ; 4.1 # DOUBLE-STRUCK SMALL PI
214C ; 4.1 # PER SIGN
23D1..23DB ; 4.1 # [11] METRICAL BREVE..FUSE
2618 ; 4.1 # SHAMROCK
267E..267F ; 4.1 # [2] PERMANENT PAPER SIGN..WHEELCHAIR SYMBOL
2692..269C ; 4.1 # [11] HAMMER AND PICK..FLEUR-DE-LIS
26A2..26B1 ; 4.1 # [16] DOUBLED FEMALE SIGN..FUNERAL URN
27C0..27C6 ; 4.1 # [7] THREE DIMENSIONAL ANGLE..RIGHT S-SHAPED BAG DELIMITER
2B0E..2B13 ; 4.1 # [6] RIGHTWARDS ARROW WITH TIP DOWNWARDS..SQUARE WITH BOTTOM HALF BLACK
2C00..2C2E ; 4.1 # [47] GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC CAPITAL LETTER LATINATE MYSLITE
2C30..2C5E ; 4.1 # [47] GLAGOLITIC SMALL LETTER AZU..GLAGOLITIC SMALL LETTER LATINATE MYSLITE
2C80..2CEA ; 4.1 # [107] COPTIC CAPITAL LETTER ALFA..COPTIC SYMBOL SHIMA SIMA
2CF9..2D25 ; 4.1 # [45] COPTIC OLD NUBIAN FULL STOP..GEORGIAN SMALL LETTER HOE
2D30..2D65 ; 4.1 # [54] TIFINAGH LETTER YA..TIFINAGH LETTER YAZZ
2D6F ; 4.1 # TIFINAGH MODIFIER LETTER LABIALIZATION MARK
2D80..2D96 ; 4.1 # [23] ETHIOPIC SYLLABLE LOA..ETHIOPIC SYLLABLE GGWE
2DA0..2DA6 ; 4.1 # [7] ETHIOPIC SYLLABLE SSA..ETHIOPIC SYLLABLE SSO
2DA8..2DAE ; 4.1 # [7] ETHIOPIC SYLLABLE CCA..ETHIOPIC SYLLABLE CCO
2DB0..2DB6 ; 4.1 # [7] ETHIOPIC SYLLABLE ZZA..ETHIOPIC SYLLABLE ZZO
2DB8..2DBE ; 4.1 # [7] ETHIOPIC SYLLABLE CCHA..ETHIOPIC SYLLABLE CCHO
2DC0..2DC6 ; 4.1 # [7] ETHIOPIC SYLLABLE QYA..ETHIOPIC SYLLABLE QYO
2DC8..2DCE ; 4.1 # [7] ETHIOPIC SYLLABLE KYA..ETHIOPIC SYLLABLE KYO
2DD0..2DD6 ; 4.1 # [7] ETHIOPIC SYLLABLE XYA..ETHIOPIC SYLLABLE XYO
2DD8..2DDE ; 4.1 # [7] ETHIOPIC SYLLABLE GYA..ETHIOPIC SYLLABLE GYO
2E00..2E17 ; 4.1 # [24] RIGHT ANGLE SUBSTITUTION MARKER..DOUBLE OBLIQUE HYPHEN
2E1C..2E1D ; 4.1 # [2] LEFT LOW PARAPHRASE BRACKET..RIGHT LOW PARAPHRASE BRACKET
31C0..31CF ; 4.1 # [16] CJK STROKE T..CJK STROKE N
327E ; 4.1 # CIRCLED HANGUL IEUNG U
9FA6..9FBB ; 4.1 # [22] CJK UNIFIED IDEOGRAPH-9FA6..CJK UNIFIED IDEOGRAPH-9FBB
A700..A716 ; 4.1 # [23] MODIFIER LETTER CHINESE TONE YIN PING..MODIFIER LETTER EXTRA-LOW LEFT-STEM TONE BAR
A800..A82B ; 4.1 # [44] SYLOTI NAGRI LETTER A..SYLOTI NAGRI POETRY MARK-4
FA70..FAD9 ; 4.1 # [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9
FE10..FE19 ; 4.1 # [10] PRESENTATION FORM FOR VERTICAL COMMA..PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS
10140..1018A ; 4.1 # [75] GREEK ACROPHONIC ATTIC ONE QUARTER..GREEK ZERO SIGN
103A0..103C3 ; 4.1 # [36] OLD PERSIAN SIGN A..OLD PERSIAN SIGN HA
103C8..103D5 ; 4.1 # [14] OLD PERSIAN SIGN AURAMAZDAA..OLD PERSIAN NUMBER HUNDRED
10A00..10A03 ; 4.1 # [4] KHAROSHTHI LETTER A..KHAROSHTHI VOWEL SIGN VOCALIC R
10A05..10A06 ; 4.1 # [2] KHAROSHTHI VOWEL SIGN E..KHAROSHTHI VOWEL SIGN O
10A0C..10A13 ; 4.1 # [8] KHAROSHTHI VOWEL LENGTH MARK..KHAROSHTHI LETTER GHA
10A15..10A17 ; 4.1 # [3] KHAROSHTHI LETTER CA..KHAROSHTHI LETTER JA
10A19..10A33 ; 4.1 # [27] KHAROSHTHI LETTER NYA..KHAROSHTHI LETTER TTTHA
10A38..10A3A ; 4.1 # [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW
10A3F..10A47 ; 4.1 # [9] KHAROSHTHI VIRAMA..KHAROSHTHI NUMBER ONE THOUSAND
10A50..10A58 ; 4.1 # [9] KHAROSHTHI PUNCTUATION DOT..KHAROSHTHI PUNCTUATION LINES
1D200..1D245 ; 4.1 # [70] GREEK VOCAL NOTATION SYMBOL-1..GREEK MUSICAL LEIMMA
1D6A4..1D6A5 ; 4.1 # [2] MATHEMATICAL ITALIC SMALL DOTLESS I..MATHEMATICAL ITALIC SMALL DOTLESS J
# Total code points: 1273
Yes a new normalizer is needed but only for newly encoded and *conforming* documents that include these codepoints.
Otherwise the previous normalizer can still be used interchangeably.
I am particularly interested, immediately, in the following new codepoints for Latin (all in the BMP):
0237..0241 ; 4.1 # [11] LATIN SMALL LETTER DOTLESS J..LATIN CAPITAL LETTER GLOTTAL STOP
0358..035C ; 4.1 # [5] COMBINING DOT ABOVE RIGHT..COMBINING DOUBLE BREVE BELOW
1D6C..1DC3 ; 4.1 # [88] LATIN SMALL LETTER B WITH MIDDLE TILDE..COMBINING SUSPENSION MARK
2090..2094 ; 4.1 # [5] LATIN SUBSCRIPT SMALL LETTER A..LATIN SUBSCRIPT SMALL LETTER SCHWA
(and I think these new characters will interest much people)...
This archive was generated by hypermail 2.1.5 : Sun Apr 03 2005 - 15:58:07 CST