Cuneiform Free Variation Selectors

From: Dean Snyder (dean.snyder@jhu.edu)
Date: Sun Jan 18 2004 - 14:13:09 EST

  • Next message: Michael Everson: "Re: Cuneiform Free Variation Selectors"

    It took Devanagiri to show me the NATURE of the technical problem posed
    by a dynamic encoding for cuneiform; it took Mongolian to show me that
    the problem HAS ALREADY BEEN SOLVED in Unicode.

    As in Devanagiri, dynamic cuneiform must be capable of mapping a sequence
    of encoded characters to a single unencoded glyph; but unlike Devanagiri,
    which glyph to select for a given character sequence in cuneiform is not
    predictable.

    Mongolian has this very problem, and it has solved it in Unicode by the
    introduction of shaping format and variant selector character codes:

    The Unicode Standard, version 4
    ch. 12.2 Mongolian, p 326-327

    >For cases in which the contextual sequence of basic letters is not
    >sufficient for a rendering engine to uniquely determine the appropriate
    >glyph for a particular letter, additional format characters are provided
    >so that the typist may specify the desired rendering.
    >...
    >Except for U+202F NARROW NO-BREAK SPACE and U+180E MONGOLIAN VOWEL
    >SEPARATOR, these characters normally have no visual appearance. Their
    >sole purpose is to guide the rendering process in selecting the
    >appropriate glyphs to represent base Mongolian letters in a particular
    >context.
    >...
    >The three MONGOLIAN FREE VARIATION SELECTOR characters are used to
    >distinguish different variants of the same letter appearing under the
    >same conditions -- that is, where more than one rendered shape is possible
    >and the selection must be indicated by human intervention rather than
    >derived by algorithm. A free variant selector immediately follows the
    >base character it modifies. Free variant selectors are not productive and
    >are therefore ignored when not immediately preceded by one of their
    >listed base characters.

    There are, of course, differences in detail between the Mongolian model
    and a proposed dynamic cuneiform model, but the basic architectural
    concept would be the same for both.

    Thus there is NO technical reason in Unicode for jettisoning a dynamic
    model for cuneiform. And this is exactly the kind of technical
    information I have been looking for all along on these email lists.

    Furthermore I believe cuneiform, even though it has more selectors (14
    instead of 7) than Mongolian, is actually simpler in this area, because,
    unlike Mongolian, the ligating system in early cuneiform was productive,
    and this would obviate the need for a standardized variants table which
    all conformant processes would have to honor:

    >The table of standardized variants, StandardizedVariants.txt, in the
    >Unicode Character Database provides a description of the variant
    >presentation glyphs corresponding to the use of specified variation
    >selectors with all allowed base Mongolian characters. Only some
    >presentation forms of the base Mongolian characters produce variant
    >presentation glyphs, when immediately followed by the Mongolian free
    >variation selectors. These combinations are exhaustively listed and
    >described in the table. All combinations not listed in the table are
    >unspecified and are reserved for future standardization; no conformant
    >process may interpret them as standardized variants.
     
    Finally, the case for variant selectors is even stronger for dynamic
    cuneiform than for Mongolian because, unlike Mongolian, there are
    practically no cases where the "use of an extended context and rules for
    shape selection" in cuneiform will help in glyph selection:

    >Use of these free variation selectors [in Mongolian] is not the only
    >way that the associated shapes can be selected. Use of an extended
    >context and rules for shape selection can obviate the need for using
    >these variation selectors in many ordinary situations.

    Plus the choice of glyph shape in Mongolian is typically a graphic issue
    - the word will look funny if you pick the wrong glyph; in cuneiform it
    is typically a semantic issue - it will be a different word if you pick
    the wrong glyph.

    Respectfully,

    Dean A. Snyder

    Assistant Research Scholar
    Manager, Digital Hammurabi Project
    Computer Science Department
    Whiting School of Engineering
    218C New Engineering Building
    3400 North Charles Street
    Johns Hopkins University
    Baltimore, Maryland, USA 21218

    office: 410 516-6850
    www.jhu.edu/digitalhammurabi



    This archive was generated by hypermail 2.1.5 : Sun Jan 18 2004 - 14:53:19 EST