From: Mark Davis (mark.davis@icu-project.org)
Date: Mon Aug 27 2007 - 10:53:49 CDT
You are quite right: I wrote a bit too hastily. You did a nice analysis of
the characters that one should support, although I have some small
modifications below.
On http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Default_Ignorable_Code_Point:]
<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B:Default_Ignorable_Code_Point:%5D>,
I'm seeing 6,346 Code Points. But we're really talking about whitespace plus
DI, so that is for 6,372 Code Points:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:Default_Ignorable_Code_Point:][:whitespace:]]
The first cut of stuff to ignore in fonts would be 5,947 Code Points:
controls, surrogates, noncharacters, unassigned.
That leaves 419 code points
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:Default_Ignorable_Code_Point:][:whitespace:]-[[:cc:][:cs:][:noncharacter_code_point:][:cn:]]](plus
the few whitespace controls)
I think it makes sense to support most if not all 26 whitespace in fonts,
although I'd group into the following priorities (but the priorities would
depend on the target audience for the
font).<http://unicode.org/cldr/utility/list-unicodeset.jsp&a=%5B:whitespace:%5D
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:whitespace:]
General
0009 <http://unicode.org/cldr/utility/character.jsp?a=0009> ( ■ ) *no name*
000A <http://unicode.org/cldr/utility/character.jsp?a=000A> ( ■ ) *no name*
000D <http://unicode.org/cldr/utility/character.jsp?a=000D> ( ■ ) *no name*
0020 <http://unicode.org/cldr/utility/character.jsp?a=0020> ( ) SPACE
00A0 <http://unicode.org/cldr/utility/character.jsp?a=00A0> ( ) NO-BREAK
SPACE
2007 <http://unicode.org/cldr/utility/character.jsp?a=2007> ( ) FIGURE SPACE
2008 <http://unicode.org/cldr/utility/character.jsp?a=2008> ( )
PUNCTUATION SPACE
Gray Area
2009 <http://unicode.org/cldr/utility/character.jsp?a=2009> ( ) THIN SPACE
200A <http://unicode.org/cldr/utility/character.jsp?a=200A> ( ) HAIR SPACE
202F <http://unicode.org/cldr/utility/character.jsp?a=202F> ( ) NARROW
NO-BREAK SPACE
205F <http://unicode.org/cldr/utility/character.jsp?a=205F> ( ) MEDIUM
MATHEMATICAL SPACE
3000 <http://unicode.org/cldr/utility/character.jsp?a=3000> ( ) IDEOGRAPHIC
SPACE
2028 <http://unicode.org/cldr/utility/character.jsp?a=2028> (
) LINE
SEPARATOR
2029 <http://unicode.org/cldr/utility/character.jsp?a=2029> (
) PARAGRAPH
SEPARATOR
000B <http://unicode.org/cldr/utility/character.jsp?a=000B> ( ■ ) *no name*
000C <http://unicode.org/cldr/utility/character.jsp?a=000C> ( ■ ) *no name*
0085 <http://unicode.org/cldr/utility/character.jsp?a=0085> ( ■ ) *no name*
Specialized
2000 <http://unicode.org/cldr/utility/character.jsp?a=2000> ( ) EN QUAD
2001 <http://unicode.org/cldr/utility/character.jsp?a=2001> ( ) EM QUAD
2002 <http://unicode.org/cldr/utility/character.jsp?a=2002> ( ) EN SPACE
2003 <http://unicode.org/cldr/utility/character.jsp?a=2003> ( ) EM SPACE
2004 <http://unicode.org/cldr/utility/character.jsp?a=2004> ( )
THREE-PER-EM SPACE
2005 <http://unicode.org/cldr/utility/character.jsp?a=2005> ( )
FOUR-PER-EM SPACE
2006 <http://unicode.org/cldr/utility/character.jsp?a=2006> ( ) SIX-PER-EM
SPACE
1680 <http://unicode.org/cldr/utility/character.jsp?a=1680> ( ) OGHAM
SPACE MARK
180E <http://unicode.org/cldr/utility/character.jsp?a=180E> ( ) MONGOLIAN
VOWEL SEPARATOR
OGHAM is a strange case, and as it came up in the last meeting, IMO
shouldn't be whitespace.
The 259 Variation Selectors should all be supported -- at least the 16 in
all fonts, but the other 250 in any CJK font and the 3 mongolians in any
Mongolian font.
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Variation_Selector:]
The Tag characters really should be deprecated -- they and the deprecated
characters do not need to be supported.
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[\U000E0000-\U000E007F][:deprecated:]]
That leaves the following 29 characters:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:Default_Ignorable_Code_Point:]-[[:cc:][:cs:][:noncharacter_code_point:][:cn:][:whitespace:][\U0001D173-\U000E007F][:Variation_Selector:][:deprecated:][\U000E0000-\U000E007F]]]
Of these, I'd group the following way:
General
00AD <http://unicode.org/cldr/utility/character.jsp?a=00AD> ( ■ ) SOFT
HYPHEN
034F <http://unicode.org/cldr/utility/character.jsp?a=034F> ( ■ ) COMBINING
GRAPHEME JOINER
200B <http://unicode.org/cldr/utility/character.jsp?a=200B> ( ■ ) ZERO WIDTH
SPACE
200C <http://unicode.org/cldr/utility/character.jsp?a=200C> ( ■ ) ZERO WIDTH
NON-JOINER
200D <http://unicode.org/cldr/utility/character.jsp?a=200D> ( ■ ) ZERO WIDTH
JOINER
200E <http://unicode.org/cldr/utility/character.jsp?a=200E> ( ■ )
LEFT-TO-RIGHT MARK
200F <http://unicode.org/cldr/utility/character.jsp?a=200F> ( ■ )
RIGHT-TO-LEFT MARK
202A <http://unicode.org/cldr/utility/character.jsp?a=202A> ( ■ )
LEFT-TO-RIGHT EMBEDDING
202B <http://unicode.org/cldr/utility/character.jsp?a=202B> ( ■ )
RIGHT-TO-LEFT EMBEDDING
202C <http://unicode.org/cldr/utility/character.jsp?a=202C> ( ■ ) POP
DIRECTIONAL FORMATTING
202D <http://unicode.org/cldr/utility/character.jsp?a=202D> ( ■ )
LEFT-TO-RIGHT OVERRIDE
202E <http://unicode.org/cldr/utility/character.jsp?a=202E> ( ■ )
RIGHT-TO-LEFT OVERRIDE
2060 <http://unicode.org/cldr/utility/character.jsp?a=2060> ( ■ ) WORD
JOINER
FEFF <http://unicode.org/cldr/utility/character.jsp?a=FEFF> ( ■ ) ZERO WIDTH
NO-BREAK SPACE
Gray Area
2061 <http://unicode.org/cldr/utility/character.jsp?a=2061> ( ■ ) FUNCTION
APPLICATION
2062 <http://unicode.org/cldr/utility/character.jsp?a=2062> ( ■ ) INVISIBLE
TIMES
2063 <http://unicode.org/cldr/utility/character.jsp?a=2063> ( ■ ) INVISIBLE
SEPARATOR
Specialized
0600 <http://unicode.org/cldr/utility/character.jsp?a=0600> ( ■ ) ARABIC
NUMBER SIGN
0601 <http://unicode.org/cldr/utility/character.jsp?a=0601> ( ■ ) ARABIC
SIGN SANAH
0602 <http://unicode.org/cldr/utility/character.jsp?a=0602> ( ■ ) ARABIC
FOOTNOTE MARKER
0603 <http://unicode.org/cldr/utility/character.jsp?a=0603> ( ■ ) ARABIC
SIGN SAFHA
06DD <http://unicode.org/cldr/utility/character.jsp?a=06DD> ( ■ ) ARABIC END
OF AYAH
070F <http://unicode.org/cldr/utility/character.jsp?a=070F> ( ■ ) SYRIAC
ABBREVIATION MARK
17B4 <http://unicode.org/cldr/utility/character.jsp?a=17B4> ( ■ ) KHMER
VOWEL INHERENT AQ
17B5 <http://unicode.org/cldr/utility/character.jsp?a=17B5> ( ■ ) KHMER
VOWEL INHERENT AA
115F <http://unicode.org/cldr/utility/character.jsp?a=115F> ( ■ ) HANGUL
CHOSEONG FILLER
1160 <http://unicode.org/cldr/utility/character.jsp?a=1160> ( ■ ) HANGUL
JUNGSEONG FILLER
3164 <http://unicode.org/cldr/utility/character.jsp?a=3164> ( ■ ) HANGUL
FILLER
FFA0 <http://unicode.org/cldr/utility/character.jsp?a=FFA0> ( ■ ) HALFWIDTH
HANGUL FILLER
This archive was generated by hypermail 2.1.5 : Mon Aug 27 2007 - 10:56:37 CDT