Re: Refining the idea for the SignWriting proposal

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Jun 21 2010 - 05:33:32 CDT

  • Next message: Andrey V. Lukyanov: "UTF-12"

    It's very interesting, but there are various things to comment about
    the concept of 2D layout to render some complex scripts, and the need
    to define some markup language separately from a limited set abstract
    characters (which may cover several ranges of symbolic glyphs that are
    more or less geometrically related) that, alone, can't represent the
    full script in a semantically meaningful way.

    If the markup language is simple enough, the semantic could be
    searchable in plain-text, but there's no warranty that this will be
    possible, unless the collection of abstract characters is large enough
    to be selective (for SignWriting, we could identify "words" and
    possibly create a stable plain-text "orthography" if this markup
    language is stable and normalizable, with the newly encoded script
    properties of the characters needed by this language).

    Anyway, it's also interesting to see how the iswa.org server is
    already using a "lightweight SignWriting Cartesian Markup" on its
    images server for building images from a markup string given to a
    "image.php" server-side script that takes:
    - such string like "symbolId,x,y,symbolId,x,y,..."
    - and some additional separate parameters like "size=".

    The characters are currently encoded with the ISWA "symbolId" which is
    a long string of small decimal numbers separated by hyphens,
    structured in groups where the first numbers indicate their category,
    symbol group, rotation and filling; the need to convert these symbol
    ids into shorter Unicode/ISO/IEC 10646 code points still needs some
    investigation about which groups (defined by some hyphen-separated
    symbolId prefix) need separate encoding.

    Notably, I'm not sure that the various rotations (or mirroring, or
    alternate fillings, or colors) are the same abstract character to
    encode, of if they should be encoded separately, given that the
    collection of symbols is not enough to convey an actual meaning
    without the markup needed to create the actual meaningful grapheme
    clusters (for SignWriting), or to spacially describe a full scene
    (e.g. in DanceWriting).

    It may be interesting to see if such cartesian markup can be used for
    other scripts with 2D layout (notably hieroglyphs, or for
    approximating sinograms built from basic strokes).

    Possibly, the characters needed to represent the 2D layout in the
    "markup language" could become new "format control" characters (in
    addition to those needed to represent the symbols/traits components of
    meaningful grapheme clusters), encoded as such in Unicode and possibly
    usable for several ranges of complex scripts with similar 2D layouts,
    notably because the meaningful grapheme clusters will become
    searchable in plain-text.

    This also suggests a new separate general category for the abstract
    symbols/traits encoded for such complex scripts, instead of assigning
    them in "gc=Lo" or defining them as unrelated symbols in "gc=S*" :
    possibly "gc=Lx" ?

    ----
    One problem is how the "simple 2D markup language" defines the
    coordinates system and how they are related to the unit size of
    glyphs, before positioning them with x,y coordinates, and how the x,y
    coordinates can be normalized.
    There also does not seem to exist any encoding (in this running image
    server) to define separate relative sizes for symbols before they are
    positioned.
    I also don't know how the image server can infer the total image width
    of the composed cluster : is the result image (when using the same
    value for the "size=..." query parameter) assumed to return a bitmap
    constrained within the same disply width and height ?
    ----
    There does seem to exist only enumerated values for fill types and
    rotation types. May be other hieroglyph and ideographic writing
    systems may need more values (notably "DanceWriting" described on the
    same site but using additional symbols for feet and lower parts of the
    body, or more complex movements).
    I don't think that the x,y coordinates for positioning symbols, or
    rotation types, or mirror types (or even stretching/narrowing) should
    be encoded as separate modifiers. But fill types are certainly looking
    like good candidates for variant encodings.
    However the proposal encodes symbols with a canonical color (defined
    as sRGB 6-digit hex number, like in HTML and CSS). If color conveys
    some meaning, it should be encodable on the markup (may be this is
    possible). Is there some symbols that have different meaning when they
    are colored differently ? Here also the canonical color should be
    inferred, even if it may be styled.
    May be the ideal system should use a common standard for such markup
    (and in fact it could be applied as well to the layout of mathematics,
    if it's possible). But there will remain issues such as the variable
    shaping of some symbols (like cartouches in Egyptian hieroglyphs, or
    similar surrounding/overlay strokes or arrows in maths notation). Is
    there a way to make it compatible or convertible with MathML (where
    color is also an option that may have some meaning) ?
    ----
    For me, there already exists a standard markup for Cartesian layout,
    it is the W3's SVG standard, and can already easily perform all the
    needed 2D positioning and basic transformations (rotations, mirroring,
    stretching/narrowing, and even slanting), as well as changing the
    default color of symbols (if they are defined as defined SVG
    primitives or groups, and referenced by their ID, where they would be
    colored by the "fill:" style property, provided that there's no color
    reassignment within the defined shapes), or font for text primitives
    (fonts are special because their glyphs are colored by the "color:",
    which is then used to recolor either fill:ed shapes and sometimes
    stroke:ed shapes, through inheritance and defaults).
    Note also that some symbols may first look as drawn with a transparent
    background, but could be in a layout where they would hide some parts
    of a background glyph. This does not mean that those foreground glyphs
    will be drawn by filling them with white, but that some clipping will
    occur within the background shapes, so that the resulting cluster will
    still have a transparent background (including within holes).
    When performing the layout, there may then be at least two actual
    shapes to render simultaneously for each symbol: one will clip the
    background shapes (computing an intersection with negated masking
    shapes), another will add an area to draw on top of it (computing an
    union of shapes, possibly with distinct coloring classes if some
    glyphs are multicolored).
    ----
    The bad thing is that the SVG standard was not defined with the simple
    goal of defining simple 2D glyphs for use in fonts. It may be too
    complex for this purpose, but may be there's a way to define and
    standardize a reduced SVG profile just to meet this goal, excluding:
    - almost all complex "fill" attributes such as pattern filling and
    color effects, including opacity, color names and even the sRGB color
    space (all replaced by a standard small enumeration of coloring
    classes such as {clipped/transparent/background, foreground-1,
    foreground-2...}). The shapes making a glyph may still be given some
    color/clipping classes, without actually defining their actual
    rendering color (which will then stylable separately).
    - complex geometry features such as "stroke"s: transforming the SVG
    "strokes" into SVG "filled" shapes requires computing stroke widths
    and dashes, limiting miter joins, computing rounded/square joins...
    and using a different "stroke-color:" style property (that also
    overrides the main "color:" style property).
    - and other extensions that may be added soon like a 3D coordinate
    system and 3D-to-2D projections, including perspective computed
    efficiently with 4D matrixes): it can be assumed that all shapes will
    be 2D and their geometry fully precomputed in a single coordinate
    system (possibly even without internal 2D transforms).
    - support for embedded CSS styling (which should be defined and used
    outside of this simple SVG profile)
    - all other unneeded/unsecured extensions such as scripting.
    Note that to support color and clipping in an external stylesheet,
    when all colors and transparent/clipping information is removed from
    the SVG definition, it will be necessary to have standard class name
    for shapes present in glyphs ("margins" for the outer convex shape to
    reserve as margins on the final rendered image and used to compute the
    default bounding box, "background" for the transparent area of the
    glyph to clip out from other background glyphs, "foreground", the
    latter being optionally combinable with the "color1", "color2"...
    class names for multicolor glyphs when some shapes don't necessarily
    use the main "foreground" color).
    All the shapes should probably preferably be ordered by ascending
    coloring class (but this may require defining precomputing the clipped
    geometry between the various colors, something that would create more
    complex shapes and is probably not needed when this can be performed
    by the SVG renderer). If several successive shapes of a defined glyph
    share the same color class, they should be merged into the same SVG
    group or into a single polyline primitive.
    ----
    On the opposite, there's probably no problem for accepting (with such
    reduced SVG profile for defining font glyphs) the support of "groups"
    <svg:g> with internal 2D transforms and repositioning, or with basic
    shapes like circles/ellipses/rectangles in addition to closed
    polylines (including straight segments and Bezier curves, as long as a
    bounding box can be computed simply from the convex hull of control
    points, after all possible 2D transforms).
    And finally, assembling the cluster will need to compute the resulting
    bounding box from the extremums of bounding boxes of all symbol
    components (defined in fonts or in the simple SVG profile), and
    probably as well a "margin box" for the extremums of margin boxes of
    all symbol components.
    ----
    This reduced SVG profile could then be used to generate the glyphs
    embedded in fonts, except that it will still lack the support of
    hinting (something that is still not instructable in standard SVG
    shapes, where the internal control points coordinates are not supposed
    to be transformed according to the effective final 2D transform and to
    the geometric/sampling properties or color masks of the target
    rendering area).
    Note that font hinting is still something that is very weakly defined,
    because it is absolutely not technology-neutral and assumes some
    physical (or perceptive) properties of the target devices that are
    used today:
    - It only works well with color or monochrome LCD/LED/plasma flat
    display panels (and only if the target subpixels are individually and
    predictably addressable in a regular rectangular lattice for each
    color plane, and only if the colored subpixels have standard
    chromacity and are calibrated in a standard color model: newer flat
    panels that use a different RGBY or RGBW color model, or a more
    isotropic triangular/hexagonal lattice, can't even benefit from the
    current proprietary font hinting technologies)
    - But it behaves extremely badly for inkjet and laser printing, or
    high-quality polychromatic offset printing, or when the light of a
    single pixel expands on several surrounding physical pixels of a
    surface with higher pixel density than light beam density (notably on
    CRT displays and on paper with fluid ink, where the "subpixels" are
    not individually adressable), so much that hinting instructions
    inserted in fonts are simply discarded when printing the document
    initially viewed on screen (as a consequence, some heavily hinted
    fonts become completely unusable when printing and often have to be
    substituted by another one, designed or hinted very differently for
    the printing device).
    - It is extremely complicate (time-consuming and costly) to define and
    tune in fonts (both globally and for each glyph) as it require an
    extremely high level of expertise to understand visually in all
    aspects (including when kerning sequences of glyphs).
    - It generates lots of rendering incoherences in various applications
    (notably geometric problems with rotated fonts, or color artefacts
    when the rendered bitmap images will be rescaled or stretched for
    display on other devices than the one known by the renderer when it
    was running).
    - In addition, it requires a proprietary scripting engine in the font
    renderer, which may be difficult to secure.
    - It makes hinted fonts non portable across systems, independantly of
    the target display device finaly used. Fonts also have to be hinted
    several times for different proprietary font renderers (or versions).
    - The specifications of such engines (or just the possibility of
    embedding hiniting instructions in fonts) are incumbered by very
    restrictive patents.
    - The interest of font hinting will be declining as it is only to
    address the specific limitations of pixel density on some classes of
    displays, whose technology could rapidly become obsolete; it may
    become much simpler to supersample the glyphs to render and let each
    target device scaling down the image to its own properties, if this is
    still needed (there now exists excellent filtering algorithms, widely
    used now in numeric photography and video, for avoiding anisotropic
    color incoherences or fuzzy effects on line borders, or for enhancing
    the contrast and maintaining a perceptively correct geometry, when
    bitmap images and their color model are remapped to different
    targets).
    Philippe.
    "André Szabolcs Szelp" <a.sz.szelp@gmail.com> wrote:
    >
    > why does the base character in the second example have a different "default"
    > fill?
    > Even if that would happen to be the most common version, I think you should
    > have a consistent base-fill and fill modifiers which does not depend on an
    > implied base fill.
    >
    > On Tue, Jun 15, 2010 at 4:51 PM, Stephen Slevinski <slevinski@gmail.com>wrote:
    >
    > > Just a few more minutes of your time...
    > >
    > > I will be dividing my SignWriting proposal into 2 parts.  First, encoding
    > > the symbols of the ISWA 2010.  Second, a technical note describing a
    > > lightweight SignWriting Cartesian Markup that can be used with the symbols
    > > for script layout.
    > >
    > > My proposal for encoding the symbols will require 674 code points.
    > > * 652 for the BaseSymbols
    > > * 6 for the fill modifiers
    > > * 16 for the rotation modifiers
    > >
    > > The SignWriting symbol set defines 37,812 valid symbols.  Each of these
    > > symbols can be defined with 3 characters: BaseSymbol, fill modifier, and
    > > rotation modifier.
    > >
    > > There are potentially 62,592 character combinations, but not all are
    > > valid.  Each BaseSymbol has a list of valid fills and valid rotations.
    > >
    > > A few examples...
    > >
    > > BaseSymbol 77 (U+1D852) , can be viewed by itself.  A different glyph is
    > > displayed when followed by fill modifier 3 (U+1DA94) and rotation modifier 1
    > > (U+1DA98) .
    > >
    > > BaseSymbol 136 (U+1D88D) , can be viewed by itself.  A different glyph is
    > > displayed when followed by fill modifier 1 (U+1DA92) and rotation modifier 2
    > > (U+1DA99) .
    > >
    > > All of the symbols are documented in the ISWA 2010 HTML Reference.  This
    > > reference will be updated as part of the proposal:
    > > http://www.signbank.org/iswa
    > >
    > > It will be proposed that initially fonts have restrictions for size and
    > > shape.  This restriction should be lifted if a scheme can be created that
    > > eliminates the requirement of exact symbol placement for proper script
    > > layout.
    > >
    > > Would such a proposal be close enough to the Unicode standard?
    > >
    > > Thanks for your time,
    


    This archive was generated by hypermail 2.1.5 : Mon Jun 21 2010 - 05:39:59 CDT