Re: Control picture glyphs (was Re: Apostrophes at www.unicode.org)

From: Mark Davis (mark.davis@icu-project.org)
Date: Mon Aug 27 2007 - 10:53:49 CDT

  • Next message: Philippe Verdy: "RE: Apostrophes at www.unicode.org"

    You are quite right: I wrote a bit too hastily. You did a nice analysis of
    the characters that one should support, although I have some small
    modifications below.

    On http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Default_Ignorable_Code_Point:]
    <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B:Default_Ignorable_Code_Point:%5D>,
    I'm seeing 6,346 Code Points. But we're really talking about whitespace plus
    DI, so that is for 6,372 Code Points:
    http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:Default_Ignorable_Code_Point:][:whitespace:]]

    The first cut of stuff to ignore in fonts would be 5,947 Code Points:
    controls, surrogates, noncharacters, unassigned.

    That leaves 419 code points
    http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:Default_Ignorable_Code_Point:][:whitespace:]-[[:cc:][:cs:][:noncharacter_code_point:][:cn:]]](plus
    the few whitespace controls)

    I think it makes sense to support most if not all 26 whitespace in fonts,
    although I'd group into the following priorities (but the priorities would
    depend on the target audience for the
    font).<http://unicode.org/cldr/utility/list-unicodeset.jsp&a=%5B:whitespace:%5D
    http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:whitespace:]

    General
    0009 <http://unicode.org/cldr/utility/character.jsp?a=0009> ( ■ ) *no name*
    000A <http://unicode.org/cldr/utility/character.jsp?a=000A> ( ■ ) *no name*
    000D <http://unicode.org/cldr/utility/character.jsp?a=000D> ( ■ ) *no name*
    0020 <http://unicode.org/cldr/utility/character.jsp?a=0020> ( ) SPACE
    00A0 <http://unicode.org/cldr/utility/character.jsp?a=00A0> ( ) NO-BREAK
    SPACE
    2007 <http://unicode.org/cldr/utility/character.jsp?a=2007> ( ) FIGURE SPACE
    2008 <http://unicode.org/cldr/utility/character.jsp?a=2008> (   )
    PUNCTUATION SPACE

    Gray Area
    2009 <http://unicode.org/cldr/utility/character.jsp?a=2009> (   ) THIN SPACE
    200A <http://unicode.org/cldr/utility/character.jsp?a=200A> (   ) HAIR SPACE
    202F <http://unicode.org/cldr/utility/character.jsp?a=202F> ( ) NARROW
    NO-BREAK SPACE
    205F <http://unicode.org/cldr/utility/character.jsp?a=205F> (   ) MEDIUM
    MATHEMATICAL SPACE
    3000 <http://unicode.org/cldr/utility/character.jsp?a=3000> ( ) IDEOGRAPHIC
    SPACE

    2028 <http://unicode.org/cldr/utility/character.jsp?a=2028> ( 
 ) LINE
    SEPARATOR
    2029 <http://unicode.org/cldr/utility/character.jsp?a=2029> ( 
 ) PARAGRAPH
    SEPARATOR
    000B <http://unicode.org/cldr/utility/character.jsp?a=000B> ( ■ ) *no name*
    000C <http://unicode.org/cldr/utility/character.jsp?a=000C> ( ■ ) *no name*
    0085 <http://unicode.org/cldr/utility/character.jsp?a=0085> ( ■ ) *no name*

    Specialized
    2000 <http://unicode.org/cldr/utility/character.jsp?a=2000> (   ) EN QUAD
    2001 <http://unicode.org/cldr/utility/character.jsp?a=2001> (   ) EM QUAD
    2002 <http://unicode.org/cldr/utility/character.jsp?a=2002> (   ) EN SPACE
    2003 <http://unicode.org/cldr/utility/character.jsp?a=2003> (   ) EM SPACE
    2004 <http://unicode.org/cldr/utility/character.jsp?a=2004> (   )
    THREE-PER-EM SPACE
    2005 <http://unicode.org/cldr/utility/character.jsp?a=2005> (   )
    FOUR-PER-EM SPACE
    2006 <http://unicode.org/cldr/utility/character.jsp?a=2006> (   ) SIX-PER-EM
    SPACE
    1680 <http://unicode.org/cldr/utility/character.jsp?a=1680> (   ) OGHAM
    SPACE MARK
    180E <http://unicode.org/cldr/utility/character.jsp?a=180E> ( ᠎ ) MONGOLIAN
    VOWEL SEPARATOR

    OGHAM is a strange case, and as it came up in the last meeting, IMO
    shouldn't be whitespace.

    The 259 Variation Selectors should all be supported -- at least the 16 in
    all fonts, but the other 250 in any CJK font and the 3 mongolians in any
    Mongolian font.
    http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Variation_Selector:]

    The Tag characters really should be deprecated -- they and the deprecated
    characters do not need to be supported.
    http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[\U000E0000-\U000E007F][:deprecated:]]

    That leaves the following 29 characters:
    http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:Default_Ignorable_Code_Point:]-[[:cc:][:cs:][:noncharacter_code_point:][:cn:][:whitespace:][\U0001D173-\U000E007F][:Variation_Selector:][:deprecated:][\U000E0000-\U000E007F]]]

    Of these, I'd group the following way:

    General
    00AD <http://unicode.org/cldr/utility/character.jsp?a=00AD> ( ■ ) SOFT
    HYPHEN
    034F <http://unicode.org/cldr/utility/character.jsp?a=034F> ( ■ ) COMBINING
    GRAPHEME JOINER
    200B <http://unicode.org/cldr/utility/character.jsp?a=200B> ( ■ ) ZERO WIDTH
    SPACE
    200C <http://unicode.org/cldr/utility/character.jsp?a=200C> ( ■ ) ZERO WIDTH
    NON-JOINER
    200D <http://unicode.org/cldr/utility/character.jsp?a=200D> ( ■ ) ZERO WIDTH
    JOINER
    200E <http://unicode.org/cldr/utility/character.jsp?a=200E> ( ■ )
    LEFT-TO-RIGHT MARK
    200F <http://unicode.org/cldr/utility/character.jsp?a=200F> ( ■ )
    RIGHT-TO-LEFT MARK
    202A <http://unicode.org/cldr/utility/character.jsp?a=202A> ( ■ )
    LEFT-TO-RIGHT EMBEDDING
    202B <http://unicode.org/cldr/utility/character.jsp?a=202B> ( ■ )
    RIGHT-TO-LEFT EMBEDDING
    202C <http://unicode.org/cldr/utility/character.jsp?a=202C> ( ■ ) POP
    DIRECTIONAL FORMATTING
    202D <http://unicode.org/cldr/utility/character.jsp?a=202D> ( ■ )
    LEFT-TO-RIGHT OVERRIDE
    202E <http://unicode.org/cldr/utility/character.jsp?a=202E> ( ■ )
    RIGHT-TO-LEFT OVERRIDE
    2060 <http://unicode.org/cldr/utility/character.jsp?a=2060> ( ■ ) WORD
    JOINER
    FEFF <http://unicode.org/cldr/utility/character.jsp?a=FEFF> ( ■ ) ZERO WIDTH
    NO-BREAK SPACE

    Gray Area
    2061 <http://unicode.org/cldr/utility/character.jsp?a=2061> ( ■ ) FUNCTION
    APPLICATION
    2062 <http://unicode.org/cldr/utility/character.jsp?a=2062> ( ■ ) INVISIBLE
    TIMES
    2063 <http://unicode.org/cldr/utility/character.jsp?a=2063> ( ■ ) INVISIBLE
    SEPARATOR

    Specialized
    0600 <http://unicode.org/cldr/utility/character.jsp?a=0600> ( ■ ) ARABIC
    NUMBER SIGN
    0601 <http://unicode.org/cldr/utility/character.jsp?a=0601> ( ■ ) ARABIC
    SIGN SANAH
    0602 <http://unicode.org/cldr/utility/character.jsp?a=0602> ( ■ ) ARABIC
    FOOTNOTE MARKER
    0603 <http://unicode.org/cldr/utility/character.jsp?a=0603> ( ■ ) ARABIC
    SIGN SAFHA
    06DD <http://unicode.org/cldr/utility/character.jsp?a=06DD> ( ■ ) ARABIC END
    OF AYAH
    070F <http://unicode.org/cldr/utility/character.jsp?a=070F> ( ■ ) SYRIAC
    ABBREVIATION MARK

    17B4 <http://unicode.org/cldr/utility/character.jsp?a=17B4> ( ■ ) KHMER
    VOWEL INHERENT AQ
    17B5 <http://unicode.org/cldr/utility/character.jsp?a=17B5> ( ■ ) KHMER
    VOWEL INHERENT AA

    115F <http://unicode.org/cldr/utility/character.jsp?a=115F> ( ■ ) HANGUL
    CHOSEONG FILLER
    1160 <http://unicode.org/cldr/utility/character.jsp?a=1160> ( ■ ) HANGUL
    JUNGSEONG FILLER
    3164 <http://unicode.org/cldr/utility/character.jsp?a=3164> ( ■ ) HANGUL
    FILLER
    FFA0 <http://unicode.org/cldr/utility/character.jsp?a=FFA0> ( ■ ) HALFWIDTH
    HANGUL FILLER



    This archive was generated by hypermail 2.1.5 : Mon Aug 27 2007 - 10:56:37 CDT