From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Jun 21 2010 - 05:33:32 CDT
It's very interesting, but there are various things to comment about
the concept of 2D layout to render some complex scripts, and the need
to define some markup language separately from a limited set abstract
characters (which may cover several ranges of symbolic glyphs that are
more or less geometrically related) that, alone, can't represent the
full script in a semantically meaningful way.
If the markup language is simple enough, the semantic could be
searchable in plain-text, but there's no warranty that this will be
possible, unless the collection of abstract characters is large enough
to be selective (for SignWriting, we could identify "words" and
possibly create a stable plain-text "orthography" if this markup
language is stable and normalizable, with the newly encoded script
properties of the characters needed by this language).
Anyway, it's also interesting to see how the iswa.org server is
already using a "lightweight SignWriting Cartesian Markup" on its
images server for building images from a markup string given to a
"image.php" server-side script that takes:
- such string like "symbolId,x,y,symbolId,x,y,..."
- and some additional separate parameters like "size=".
The characters are currently encoded with the ISWA "symbolId" which is
a long string of small decimal numbers separated by hyphens,
structured in groups where the first numbers indicate their category,
symbol group, rotation and filling; the need to convert these symbol
ids into shorter Unicode/ISO/IEC 10646 code points still needs some
investigation about which groups (defined by some hyphen-separated
symbolId prefix) need separate encoding.
Notably, I'm not sure that the various rotations (or mirroring, or
alternate fillings, or colors) are the same abstract character to
encode, of if they should be encoded separately, given that the
collection of symbols is not enough to convey an actual meaning
without the markup needed to create the actual meaningful grapheme
clusters (for SignWriting), or to spacially describe a full scene
(e.g. in DanceWriting).
It may be interesting to see if such cartesian markup can be used for
other scripts with 2D layout (notably hieroglyphs, or for
approximating sinograms built from basic strokes).
Possibly, the characters needed to represent the 2D layout in the
"markup language" could become new "format control" characters (in
addition to those needed to represent the symbols/traits components of
meaningful grapheme clusters), encoded as such in Unicode and possibly
usable for several ranges of complex scripts with similar 2D layouts,
notably because the meaningful grapheme clusters will become
searchable in plain-text.
This also suggests a new separate general category for the abstract
symbols/traits encoded for such complex scripts, instead of assigning
them in "gc=Lo" or defining them as unrelated symbols in "gc=S*" :
possibly "gc=Lx" ?
---- One problem is how the "simple 2D markup language" defines the coordinates system and how they are related to the unit size of glyphs, before positioning them with x,y coordinates, and how the x,y coordinates can be normalized. There also does not seem to exist any encoding (in this running image server) to define separate relative sizes for symbols before they are positioned. I also don't know how the image server can infer the total image width of the composed cluster : is the result image (when using the same value for the "size=..." query parameter) assumed to return a bitmap constrained within the same disply width and height ? ---- There does seem to exist only enumerated values for fill types and rotation types. May be other hieroglyph and ideographic writing systems may need more values (notably "DanceWriting" described on the same site but using additional symbols for feet and lower parts of the body, or more complex movements). I don't think that the x,y coordinates for positioning symbols, or rotation types, or mirror types (or even stretching/narrowing) should be encoded as separate modifiers. But fill types are certainly looking like good candidates for variant encodings. However the proposal encodes symbols with a canonical color (defined as sRGB 6-digit hex number, like in HTML and CSS). If color conveys some meaning, it should be encodable on the markup (may be this is possible). Is there some symbols that have different meaning when they are colored differently ? Here also the canonical color should be inferred, even if it may be styled. May be the ideal system should use a common standard for such markup (and in fact it could be applied as well to the layout of mathematics, if it's possible). But there will remain issues such as the variable shaping of some symbols (like cartouches in Egyptian hieroglyphs, or similar surrounding/overlay strokes or arrows in maths notation). Is there a way to make it compatible or convertible with MathML (where color is also an option that may have some meaning) ? ---- For me, there already exists a standard markup for Cartesian layout, it is the W3's SVG standard, and can already easily perform all the needed 2D positioning and basic transformations (rotations, mirroring, stretching/narrowing, and even slanting), as well as changing the default color of symbols (if they are defined as defined SVG primitives or groups, and referenced by their ID, where they would be colored by the "fill:" style property, provided that there's no color reassignment within the defined shapes), or font for text primitives (fonts are special because their glyphs are colored by the "color:", which is then used to recolor either fill:ed shapes and sometimes stroke:ed shapes, through inheritance and defaults). Note also that some symbols may first look as drawn with a transparent background, but could be in a layout where they would hide some parts of a background glyph. This does not mean that those foreground glyphs will be drawn by filling them with white, but that some clipping will occur within the background shapes, so that the resulting cluster will still have a transparent background (including within holes). When performing the layout, there may then be at least two actual shapes to render simultaneously for each symbol: one will clip the background shapes (computing an intersection with negated masking shapes), another will add an area to draw on top of it (computing an union of shapes, possibly with distinct coloring classes if some glyphs are multicolored). ---- The bad thing is that the SVG standard was not defined with the simple goal of defining simple 2D glyphs for use in fonts. It may be too complex for this purpose, but may be there's a way to define and standardize a reduced SVG profile just to meet this goal, excluding: - almost all complex "fill" attributes such as pattern filling and color effects, including opacity, color names and even the sRGB color space (all replaced by a standard small enumeration of coloring classes such as {clipped/transparent/background, foreground-1, foreground-2...}). The shapes making a glyph may still be given some color/clipping classes, without actually defining their actual rendering color (which will then stylable separately). - complex geometry features such as "stroke"s: transforming the SVG "strokes" into SVG "filled" shapes requires computing stroke widths and dashes, limiting miter joins, computing rounded/square joins... and using a different "stroke-color:" style property (that also overrides the main "color:" style property). - and other extensions that may be added soon like a 3D coordinate system and 3D-to-2D projections, including perspective computed efficiently with 4D matrixes): it can be assumed that all shapes will be 2D and their geometry fully precomputed in a single coordinate system (possibly even without internal 2D transforms). - support for embedded CSS styling (which should be defined and used outside of this simple SVG profile) - all other unneeded/unsecured extensions such as scripting. Note that to support color and clipping in an external stylesheet, when all colors and transparent/clipping information is removed from the SVG definition, it will be necessary to have standard class name for shapes present in glyphs ("margins" for the outer convex shape to reserve as margins on the final rendered image and used to compute the default bounding box, "background" for the transparent area of the glyph to clip out from other background glyphs, "foreground", the latter being optionally combinable with the "color1", "color2"... class names for multicolor glyphs when some shapes don't necessarily use the main "foreground" color). All the shapes should probably preferably be ordered by ascending coloring class (but this may require defining precomputing the clipped geometry between the various colors, something that would create more complex shapes and is probably not needed when this can be performed by the SVG renderer). If several successive shapes of a defined glyph share the same color class, they should be merged into the same SVG group or into a single polyline primitive. ---- On the opposite, there's probably no problem for accepting (with such reduced SVG profile for defining font glyphs) the support of "groups" <svg:g> with internal 2D transforms and repositioning, or with basic shapes like circles/ellipses/rectangles in addition to closed polylines (including straight segments and Bezier curves, as long as a bounding box can be computed simply from the convex hull of control points, after all possible 2D transforms). And finally, assembling the cluster will need to compute the resulting bounding box from the extremums of bounding boxes of all symbol components (defined in fonts or in the simple SVG profile), and probably as well a "margin box" for the extremums of margin boxes of all symbol components. ---- This reduced SVG profile could then be used to generate the glyphs embedded in fonts, except that it will still lack the support of hinting (something that is still not instructable in standard SVG shapes, where the internal control points coordinates are not supposed to be transformed according to the effective final 2D transform and to the geometric/sampling properties or color masks of the target rendering area). Note that font hinting is still something that is very weakly defined, because it is absolutely not technology-neutral and assumes some physical (or perceptive) properties of the target devices that are used today: - It only works well with color or monochrome LCD/LED/plasma flat display panels (and only if the target subpixels are individually and predictably addressable in a regular rectangular lattice for each color plane, and only if the colored subpixels have standard chromacity and are calibrated in a standard color model: newer flat panels that use a different RGBY or RGBW color model, or a more isotropic triangular/hexagonal lattice, can't even benefit from the current proprietary font hinting technologies) - But it behaves extremely badly for inkjet and laser printing, or high-quality polychromatic offset printing, or when the light of a single pixel expands on several surrounding physical pixels of a surface with higher pixel density than light beam density (notably on CRT displays and on paper with fluid ink, where the "subpixels" are not individually adressable), so much that hinting instructions inserted in fonts are simply discarded when printing the document initially viewed on screen (as a consequence, some heavily hinted fonts become completely unusable when printing and often have to be substituted by another one, designed or hinted very differently for the printing device). - It is extremely complicate (time-consuming and costly) to define and tune in fonts (both globally and for each glyph) as it require an extremely high level of expertise to understand visually in all aspects (including when kerning sequences of glyphs). - It generates lots of rendering incoherences in various applications (notably geometric problems with rotated fonts, or color artefacts when the rendered bitmap images will be rescaled or stretched for display on other devices than the one known by the renderer when it was running). - In addition, it requires a proprietary scripting engine in the font renderer, which may be difficult to secure. - It makes hinted fonts non portable across systems, independantly of the target display device finaly used. Fonts also have to be hinted several times for different proprietary font renderers (or versions). - The specifications of such engines (or just the possibility of embedding hiniting instructions in fonts) are incumbered by very restrictive patents. - The interest of font hinting will be declining as it is only to address the specific limitations of pixel density on some classes of displays, whose technology could rapidly become obsolete; it may become much simpler to supersample the glyphs to render and let each target device scaling down the image to its own properties, if this is still needed (there now exists excellent filtering algorithms, widely used now in numeric photography and video, for avoiding anisotropic color incoherences or fuzzy effects on line borders, or for enhancing the contrast and maintaining a perceptively correct geometry, when bitmap images and their color model are remapped to different targets). Philippe. "André Szabolcs Szelp" <a.sz.szelp@gmail.com> wrote: > > why does the base character in the second example have a different "default" > fill? > Even if that would happen to be the most common version, I think you should > have a consistent base-fill and fill modifiers which does not depend on an > implied base fill. > > On Tue, Jun 15, 2010 at 4:51 PM, Stephen Slevinski <slevinski@gmail.com>wrote: > > > Just a few more minutes of your time... > > > > I will be dividing my SignWriting proposal into 2 parts. First, encoding > > the symbols of the ISWA 2010. Second, a technical note describing a > > lightweight SignWriting Cartesian Markup that can be used with the symbols > > for script layout. > > > > My proposal for encoding the symbols will require 674 code points. > > * 652 for the BaseSymbols > > * 6 for the fill modifiers > > * 16 for the rotation modifiers > > > > The SignWriting symbol set defines 37,812 valid symbols. Each of these > > symbols can be defined with 3 characters: BaseSymbol, fill modifier, and > > rotation modifier. > > > > There are potentially 62,592 character combinations, but not all are > > valid. Each BaseSymbol has a list of valid fills and valid rotations. > > > > A few examples... > > > > BaseSymbol 77 (U+1D852) , can be viewed by itself. A different glyph is > > displayed when followed by fill modifier 3 (U+1DA94) and rotation modifier 1 > > (U+1DA98) . > > > > BaseSymbol 136 (U+1D88D) , can be viewed by itself. A different glyph is > > displayed when followed by fill modifier 1 (U+1DA92) and rotation modifier 2 > > (U+1DA99) . > > > > All of the symbols are documented in the ISWA 2010 HTML Reference. This > > reference will be updated as part of the proposal: > > http://www.signbank.org/iswa > > > > It will be proposed that initially fonts have restrictions for size and > > shape. This restriction should be lifted if a scheme can be created that > > eliminates the requirement of exact symbol placement for proper script > > layout. > > > > Would such a proposal be close enough to the Unicode standard? > > > > Thanks for your time,
This archive was generated by hypermail 2.1.5 : Mon Jun 21 2010 - 05:39:59 CDT