Re: Refining the idea for the SignWriting proposal

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Jun 21 2010 - 05:33:32 CDT

Next message: Andrey V. Lukyanov: "UTF-12"

Previous message: Doug Ewell: "Re: Latin Script"
Maybe in reply to: Stephen Slevinski: "Refining the idea for the SignWriting proposal"
Next in thread: Kenneth Whistler: "Re: Refining the idea for the SignWriting proposal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

It's very interesting, but there are various things to comment about
the concept of 2D layout to render some complex scripts, and the need
to define some markup language separately from a limited set abstract
characters (which may cover several ranges of symbolic glyphs that are
more or less geometrically related) that, alone, can't represent the
full script in a semantically meaningful way.

If the markup language is simple enough, the semantic could be
searchable in plain-text, but there's no warranty that this will be
possible, unless the collection of abstract characters is large enough
to be selective (for SignWriting, we could identify "words" and
possibly create a stable plain-text "orthography" if this markup
language is stable and normalizable, with the newly encoded script
properties of the characters needed by this language).

Anyway, it's also interesting to see how the iswa.org server is
already using a "lightweight SignWriting Cartesian Markup" on its
images server for building images from a markup string given to a
"image.php" server-side script that takes:
- such string like "symbolId,x,y,symbolId,x,y,..."
- and some additional separate parameters like "size=".

The characters are currently encoded with the ISWA "symbolId" which is
a long string of small decimal numbers separated by hyphens,
structured in groups where the first numbers indicate their category,
symbol group, rotation and filling; the need to convert these symbol
ids into shorter Unicode/ISO/IEC 10646 code points still needs some
investigation about which groups (defined by some hyphen-separated
symbolId prefix) need separate encoding.

Notably, I'm not sure that the various rotations (or mirroring, or
alternate fillings, or colors) are the same abstract character to
encode, of if they should be encoded separately, given that the
collection of symbols is not enough to convey an actual meaning
without the markup needed to create the actual meaningful grapheme
clusters (for SignWriting), or to spacially describe a full scene
(e.g. in DanceWriting).

It may be interesting to see if such cartesian markup can be used for
other scripts with 2D layout (notably hieroglyphs, or for
approximating sinograms built from basic strokes).

Possibly, the characters needed to represent the 2D layout in the
"markup language" could become new "format control" characters (in
addition to those needed to represent the symbols/traits components of
meaningful grapheme clusters), encoded as such in Unicode and possibly
usable for several ranges of complex scripts with similar 2D layouts,
notably because the meaningful grapheme clusters will become
searchable in plain-text.

This also suggests a new separate general category for the abstract
symbols/traits encoded for such complex scripts, instead of assigning
them in "gc=Lo" or defining them as unrelated symbols in "gc=S*" :
possibly "gc=Lx" ?

----
One problem is how the "simple 2D markup language" defines the
coordinates system and how they are related to the unit size of
glyphs, before positioning them with x,y coordinates, and how the x,y
coordinates can be normalized.
There also does not seem to exist any encoding (in this running image
server) to define separate relative sizes for symbols before they are
positioned.
I also don't know how the image server can infer the total image width
of the composed cluster : is the result image (when using the same
value for the "size=..." query parameter) assumed to return a bitmap
constrained within the same disply width and height ?
----
There does seem to exist only enumerated values for fill types and
rotation types. May be other hieroglyph and ideographic writing
systems may need more values (notably "DanceWriting" described on the
same site but using additional symbols for feet and lower parts of the
body, or more complex movements).
I don't think that the x,y coordinates for positioning symbols, or
rotation types, or mirror types (or even stretching/narrowing) should
be encoded as separate modifiers. But fill types are certainly looking
like good candidates for variant encodings.
However the proposal encodes symbols with a canonical color (defined
as sRGB 6-digit hex number, like in HTML and CSS). If color conveys
some meaning, it should be encodable on the markup (may be this is
possible). Is there some symbols that have different meaning when they
are colored differently ? Here also the canonical color should be
inferred, even if it may be styled.
May be the ideal system should use a common standard for such markup
(and in fact it could be applied as well to the layout of mathematics,
if it's possible). But there will remain issues such as the variable
shaping of some symbols (like cartouches in Egyptian hieroglyphs, or
similar surrounding/overlay strokes or arrows in maths notation). Is
there a way to make it compatible or convertible with MathML (where
color is also an option that may have some meaning) ?
----
For me, there already exists a standard markup for Cartesian layout,
it is the W3's SVG standard, and can already easily perform all the
needed 2D positioning and basic transformations (rotations, mirroring,
stretching/narrowing, and even slanting), as well as changing the
default color of symbols (if they are defined as defined SVG
primitives or groups, and referenced by their ID, where they would be
colored by the "fill:" style property, provided that there's no color
reassignment within the defined shapes), or font for text primitives
(fonts are special because their glyphs are colored by the "color:",
which is then used to recolor either fill:ed shapes and sometimes
stroke:ed shapes, through inheritance and defaults).
Note also that some symbols may first look as drawn with a transparent
background, but could be in a layout where they would hide some parts
of a background glyph. This does not mean that those foreground glyphs
will be drawn by filling them with white, but that some clipping will
occur within the background shapes, so that the resulting cluster will
still have a transparent background (including within holes).
When performing the layout, there may then be at least two actual
shapes to render simultaneously for each symbol: one will clip the
background shapes (computing an intersection with negated masking
shapes), another will add an area to draw on top of it (computing an
union of shapes, possibly with distinct coloring classes if some
glyphs are multicolored).
----
The bad thing is that the SVG standard was not defined with the simple
goal of defining simple 2D glyphs for use in fonts. It may be too
complex for this purpose, but may be there's a way to define and
standardize a reduced SVG profile just to meet this goal, excluding:
- almost all complex "fill" attributes such as pattern filling and
color effects, including opacity, color names and even the sRGB color
space (all replaced by a standard small enumeration of coloring
classes such as {clipped/transparent/background, foreground-1,
foreground-2...}). The shapes making a glyph may still be given some
color/clipping classes, without actually defining their actual
rendering color (which will then stylable separately).
- complex geometry features such as "stroke"s: transforming the SVG
"strokes" into SVG "filled" shapes requires computing stroke widths
and dashes, limiting miter joins, computing rounded/square joins...
and using a different "stroke-color:" style property (that also
overrides the main "color:" style property).
- and other extensions that may be added soon like a 3D coordinate
system and 3D-to-2D projections, including perspective computed
efficiently with 4D matrixes): it can be assumed that all shapes will
be 2D and their geometry fully precomputed in a single coordinate
system (possibly even without internal 2D transforms).
- support for embedded CSS styling (which should be defined and used
outside of this simple SVG profile)
- all other unneeded/unsecured extensions such as scripting.
Note that to support color and clipping in an external stylesheet,
when all colors and transparent/clipping information is removed from
the SVG definition, it will be necessary to have standard class name
for shapes present in glyphs ("margins" for the outer convex shape to
reserve as margins on the final rendered image and used to compute the
default bounding box, "background" for the transparent area of the
glyph to clip out from other background glyphs, "foreground", the
latter being optionally combinable with the "color1", "color2"...
class names for multicolor glyphs when some shapes don't necessarily
use the main "foreground" color).
All the shapes should probably preferably be ordered by ascending
coloring class (but this may require defining precomputing the clipped
geometry between the various colors, something that would create more
complex shapes and is probably not needed when this can be performed
by the SVG renderer). If several successive shapes of a defined glyph
share the same color class, they should be merged into the same SVG
group or into a single polyline primitive.
----
On the opposite, there's probably no problem for accepting (with such
reduced SVG profile for defining font glyphs) the support of "groups"
<svg:g> with internal 2D transforms and repositioning, or with basic
shapes like circles/ellipses/rectangles in addition to closed
polylines (including straight segments and Bezier curves, as long as a
bounding box can be computed simply from the convex hull of control
points, after all possible 2D transforms).
And finally, assembling the cluster will need to compute the resulting
bounding box from the extremums of bounding boxes of all symbol
components (defined in fonts or in the simple SVG profile), and
probably as well a "margin box" for the extremums of margin boxes of
all symbol components.
----
This reduced SVG profile could then be used to generate the glyphs
embedded in fonts, except that it will still lack the support of
hinting (something that is still not instructable in standard SVG
shapes, where the internal control points coordinates are not supposed
to be transformed according to the effective final 2D transform and to
the geometric/sampling properties or color masks of the target
rendering area).
Note that font hinting is still something that is very weakly defined,
because it is absolutely not technology-neutral and assumes some
physical (or perceptive) properties of the target devices that are
used today:
- It only works well with color or monochrome LCD/LED/plasma flat
display panels (and only if the target subpixels are individually and
predictably addressable in a regular rectangular lattice for each
color plane, and only if the colored subpixels have standard
chromacity and are calibrated in a standard color model: newer flat
panels that use a different RGBY or RGBW color model, or a more
isotropic triangular/hexagonal lattice, can't even benefit from the
current proprietary font hinting technologies)
- But it behaves extremely badly for inkjet and laser printing, or
high-quality polychromatic offset printing, or when the light of a
single pixel expands on several surrounding physical pixels of a
surface with higher pixel density than light beam density (notably on
CRT displays and on paper with fluid ink, where the "subpixels" are
not individually adressable), so much that hinting instructions
inserted in fonts are simply discarded when printing the document
initially viewed on screen (as a consequence, some heavily hinted
fonts become completely unusable when printing and often have to be
substituted by another one, designed or hinted very differently for
the printing device).
- It is extremely complicate (time-consuming and costly) to define and
tune in fonts (both globally and for each glyph) as it require an
extremely high level of expertise to understand visually in all
aspects (including when kerning sequences of glyphs).
- It generates lots of rendering incoherences in various applications
(notably geometric problems with rotated fonts, or color artefacts
when the rendered bitmap images will be rescaled or stretched for
display on other devices than the one known by the renderer when it
was running).
- In addition, it requires a proprietary scripting engine in the font
renderer, which may be difficult to secure.
- It makes hinted fonts non portable across systems, independantly of
the target display device finaly used. Fonts also have to be hinted
several times for different proprietary font renderers (or versions).
- The specifications of such engines (or just the possibility of
embedding hiniting instructions in fonts) are incumbered by very
restrictive patents.
- The interest of font hinting will be declining as it is only to
address the specific limitations of pixel density on some classes of
displays, whose technology could rapidly become obsolete; it may
become much simpler to supersample the glyphs to render and let each
target device scaling down the image to its own properties, if this is
still needed (there now exists excellent filtering algorithms, widely
used now in numeric photography and video, for avoiding anisotropic
color incoherences or fuzzy effects on line borders, or for enhancing
the contrast and maintaining a perceptively correct geometry, when
bitmap images and their color model are remapped to different
targets).
Philippe.
"André Szabolcs Szelp" <a.sz.szelp@gmail.com> wrote:
>
> why does the base character in the second example have a different "default"
> fill?
> Even if that would happen to be the most common version, I think you should
> have a consistent base-fill and fill modifiers which does not depend on an
> implied base fill.
>
> On Tue, Jun 15, 2010 at 4:51 PM, Stephen Slevinski <slevinski@gmail.com>wrote:
>
> > Just a few more minutes of your time...
> >
> > I will be dividing my SignWriting proposal into 2 parts.  First, encoding
> > the symbols of the ISWA 2010.  Second, a technical note describing a
> > lightweight SignWriting Cartesian Markup that can be used with the symbols
> > for script layout.
> >
> > My proposal for encoding the symbols will require 674 code points.
> > * 652 for the BaseSymbols
> > * 6 for the fill modifiers
> > * 16 for the rotation modifiers
> >
> > The SignWriting symbol set defines 37,812 valid symbols.  Each of these
> > symbols can be defined with 3 characters: BaseSymbol, fill modifier, and
> > rotation modifier.
> >
> > There are potentially 62,592 character combinations, but not all are
> > valid.  Each BaseSymbol has a list of valid fills and valid rotations.
> >
> > A few examples...
> >
> > BaseSymbol 77 (U+1D852) , can be viewed by itself.  A different glyph is
> > displayed when followed by fill modifier 3 (U+1DA94) and rotation modifier 1
> > (U+1DA98) .
> >
> > BaseSymbol 136 (U+1D88D) , can be viewed by itself.  A different glyph is
> > displayed when followed by fill modifier 1 (U+1DA92) and rotation modifier 2
> > (U+1DA99) .
> >
> > All of the symbols are documented in the ISWA 2010 HTML Reference.  This
> > reference will be updated as part of the proposal:
> > http://www.signbank.org/iswa
> >
> > It will be proposed that initially fonts have restrictions for size and
> > shape.  This restriction should be lifted if a scheme can be created that
> > eliminates the requirement of exact symbol placement for proper script
> > layout.
> >
> > Would such a proposal be close enough to the Unicode standard?
> >
> > Thanks for your time,

Next message: Andrey V. Lukyanov: "UTF-12"
Previous message: Doug Ewell: "Re: Latin Script"
Maybe in reply to: Stephen Slevinski: "Refining the idea for the SignWriting proposal"
Next in thread: Kenneth Whistler: "Re: Refining the idea for the SignWriting proposal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jun 21 2010 - 05:39:59 CDT