CSS3, Unicode BIDI, and Vertical Text Layout

From: fantasai (fantasai.lists@inkedblade.net)
Date: Wed Oct 20 2004 - 12:05:05 CST

  • Next message: Jony Rosenne: "RE: CSS3, Unicode BIDI, and Vertical Text Layout"

    I've been working on a revision of CSS3 Text to clean up its handling of
    vertical text layout. An explanation of the system, targetted at the members
    of this list, is up at

        http://fantasai.inkedblade.net/style/discuss/vertical-text/

    The full text follows for archival and discussion purposes. If you want to
    /read/ the document, you're best off with the HTML version; it has real
    links and embedded diagrams.

    Comments are welcome. Please post them here so everyone can read. :)

    ~fantasai

    ###########################################################################

    Robust Vertical Text Layout

         by fantasai <http://fantasai.inkedblade.net/contact>

        Few formatting systems today can handle vertical text layout, and most
        of those only lay out text in right-to-left columns. This document
        outlines a system that can not only handle common scripts in vertical
        right-to-left columns, but that can _gracefully_ accept uncommon
        script combinations and left-to-right text columns. The model is
        described here as a CSS system, but the concepts can apply to non-CSS
        systems as well.

        The CSS model and Unicode provide support for logical text layout, but
        only in horizontal flow. Although CSS3 Text attempts to use horizontal
        BIDI controls to handle vertical BIDI, the system it sets up is
        ill-defined and inflexible, relies on assumptions that may not hold
        true, and requires a styled document's content and its markup to be
        adapted to the CSS rather than the other way 'round. A better design
        would use the intrinsic properties of the characters and an expansion
        of Unicode's logic to lay out the text. A layout model thus based on
        the logic and knowledge of writing systems can scale to gracefully
        handle any combination of scripts, can correctly (if not optimally)
        lay out text with any combination of styling properties, and can
        integrate well with the layered Unicode + Markup + Styling model of
        semantically-tagged documents.

        The examples in this text require support for Unicode BIDI and Arabic
        shaping, and fonts for Simplified Chinese and Arabic/Farsi. Most
        diagrams are available in SVG, but inline versions are in PNG with
        fallbacks in GIF.

        Recommended browsers (recent versions):
          * Opera <http://www.opera.com/>
          * Gecko-based (such as Mozilla, Firefox, or Camino (Mac OS X))
            <http://www.mozilla.org/>
          * Microsoft Internet Explorer or Safari may suffice.

        More about Unicode fonts and other software
        <http://www.alanwood.net/unicode/>

         1. Background
              1. The &#x2018;Cascade&#x2019; in Cascading Style Sheets
              2. CSS and Unicode Bidi
         2. Abusing Directionality and Its Consequences: A Case Study of
            CSS3 Text
         3. Describing Text Flow
              1. Physical vs. Logical Description
              2. Intrinsic Directionality and Orientation
                   1. Script Classification by Directionality
              3. Logical Text Flow
                   1. Implying Direction
                   2. Orienting by Block Progression
                   3. The Three Switches of Logial Text Layout
         4. Implementing A Logical Text Layout System
              1. Composing Lines of Text
                   1. Character Ordering
                   2. Glyph Orientation
                        1. Vertical Scripts
                        2. Horizontal Scripts
                        3. Punctuation
                   3. Character Shaping
              2. Understanding Character Properties
         5. Why and How the Unicode Consortium Should Be Involved
              1. What happens if Unicode chooses not to standardize the
                 additional character data?
         6. About the Author and the Status of CSS3 Text
         7. Acknowledgements
         8. Appendix: Vertical Scripts in Horizontal Flow

    Background

    The "Cascade" in Cascading Style Sheets

        Unlike many formatting systems, in which styling properties are
        definitively applied to a page element at one point, CSS collects and
        applies to the element multiple style rules from the author, reader,
        and user agent. In case of a conflict, the origin of the rule and the
        specificity of the rule's element selector determine which of the
        conflicting property values takes effect on the element. This process
        of sorting and applying style rules is called cascading[1], and it
        allows style rules from multiple sources and with separate formatting
        purposes to interact in a rigorous way.

        [1] http://www.w3.org/TR/CSS21/cascade.html#cascade

        *Cascading means that style properties specified together are not
        guaranteed to take effect together.* This raises the design standards
        for creating CSS properties and pushes them towards a more logical,
        rather than physical, description of the intended design.

    CSS and Unicode BIDI

        CSS2 introduced the direction and unicode-bidi properties to
        incorporate markup directives such as HTML's dir attribute into the
        CSS rendering model, and to allow the use of markup semantics in
        assigning BIDI embeddings. The direction property can take the values
        ltr and rtl, and this value inherits to descendant elements. The
        unicode-bidi property assigns embeddings and overrides in the
        direction given by the direction property. Its behavior is defined in
        terms of the Unicode embedding and override codes.

           /* map 'dir' attribute to 'direction' + embedding*/
           *[dir="ltr"] {direction: ltr; unicode-bidi: embed; }
           *[dir="rtl"] {direction: rtl; unicode-bidi: embed; }
           /* embed quotations so they always stay as a single unit */
           q {unicode-bidi: embed;}

        When applied to a block of text, the direction property specifies the
        block's embedding direction; CSS documents do not use heuristics to
        guess the block's embedding direction.

        These properties are meant to reflect BIDI distinctions necessary for
        the proper ordering of text. Authors in general are discouraged[2]
        from using the properties in favor of the direct markup that would
        trigger the appropriate values.

        [2] http://www.w3.org/TR/i18n-html-tech-bidi/#ri20030728.092130948

    Abusing Directionality and Its Consequences: A Case Study of CSS3 Text CR

        CSS3 Text was intended to update and expand the text layout
        capabilities of CSS2 by adding support for more international
        typesetting features and introducing controls for laying out vertical
        text. It defines a 'block-progression' property, which switches the
        line stacking direction, and hijacks 'rtl' and 'ltr' values of the
        'direction' property to use as an inline-progression control in
        vertical text.

        Cite: http://www.w3.org/TR/2003/CR-css3-text-2003051/#writing-mode

         | writing-mode: direction: block-progression: Common Usage:
         | lr-tb ltr tb Latin-based, Greek, Cyrillic
         | (and many others)
         | rl-tb rtl tb Arabic, Hebrew
         | tb-rl ltr rl Chinese, Japanese, Korean
         | tb-lr rtl lr Traditional Mongolian

        It is a good example of how _not_ to set up a vertical text system.

        In order to interface with the Unicode BIDI Algorithm[3], CSS3 Text
        maps vertical scripts' character directionality based on the
        paragraph's block progression.

        [3] http://www.unicode.org/reports/tr9/

        Because all vertical scripts in Unicode are assigned a canonical
        directionality of left-to-right, BIDI reordering proceeds as normal
        when the text columns are stacked right-to-left.

        However, if the columns of text are stacking the other way--from
        left to right--then the same characters (which so far are all given
        left-to-right directionality in Unicode) are treated as right-to-left
        characters (R). This was done because left-to-right scripts such as
        Latin read bottom to top when the lines of text were ordered left to
        right. You can see this often on image and table captions when the
        text runs along the left side. The first line of text runs from bottom
        to top, and lines stack with each one to the right of the previous
        (left to right). In this case, top-to-bottom scripts _must_ go in the
        direction _opposite_ the left-to-right text, and the opposite of ltr
        is rtl.

        Messing with the directionality of vertical scripts messes with other
        bits of text layout as well, and much of this interaction was left
        undefined. Character shaping, for example, depends on the
        BIDI-reordered string being in normal order. Not only character
        ordering, but the character shaping algorithm and the font rendering
        code all need to compensate for the altered input to the BIDI
        algorithm, and CSS3 Text failed to explain the necessary changes.

        For example, Mongolian is a cursive vertical script. Like Arabic (to
        which it is related), it is also a shaping script: a letter at the
        beginning of a word is shaped differently from one in the middle or at
        the end. Unicode defines Mongolian to be a left-to-right script, so
        shaping makes the leftmost character of a word into an initial and the
        rightmost character into a final. If, however, the Mongolian word is
        ordered right-to-left, then the initial letter of the word will be on
        the right, and therefore shaped as a final and not an initial. This is
        because shaping happens _after_ reordering. Vertical Mongolian text
        treated like this will look upside down and read like a bunch of
        gibberish, and no amount of glyph rotation can fix the problem. To
        make the letters connect properly under the CSS right-to-left
        override, the Mongolian parts of the text would need to be shaped in
        reverse and then have their glyphs rotated 180°--but this is not even
        mentioned in CSS3 Text.

        To accomodate CSS3 Text's ill-defined tweaks to BIDI reordering
        (and character shaping and font rendering), a layout system can't
        simply pass the string to standard Unicode processing functions.
        Assume, however, that the layout system manages to hold up internally
        the pretense that "top-to-bottom" is "right-to-left". It still needs
        to interact with BIDI instructions from the outside world, which
        doesn't share the delusion. In an effort to make the outside world
        _seem_ like it's adapted to these changes, CSS3 Text instructs the
        designer to use 'direction: rtl' when assigning 'block-progression: lr'
        to a block of top-to-bottom text (such as Mongolian or Chinese, both
        actually ltr scripts), in effect asking him to lie about the text's
        properties. Like most lies, it seems to work in the general case, but
        as the situation gets complicated, the system breaks down...

          * Foremost, if the expected block progression fails to take effect--
            whether through the cascade or through lack of UA support--the
            text direction and the assigned embedding direction no longer match
            and the subtleties of Unicode BIDI can wreak havoc on the order of
            the text.
                                 <example>

          * CSS embeddings set on elements _within_ the formatted block are no
            longer necessarily going the right way.
                                 <example>

          * HTML dir attributes that were added with the assumption of
            regular, horizontal text might or might not need to have their
            effects be reversed.
                                 <example>

          * There is no mention of how character shaping should happen.

        In conclusion, abusing directionality controls to make a limited
        system lay out text correctly doesn't scale. It's a hack, not a
        solution.

    Describing Text Flow

        To describe how a text flows into lines, one needs to know three
        things:

          Image: Three Vectors
                 <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/text-flow-vectors-tb.svg>

          * which way the text flows within a line (inline progression)
          * which way the lines stack (block progression)
          * which way the glyphs are facing (glyph orientation)

        However, not all combinations of text direction and glyph orientation
        are valid. Therefore if certain of the character's inherent
        characteristics are known, it is often possible to derive one from the
        other. Unicode systems take advantage of this model in horizontal
        text: you don't have to manually tell every run of Hebrew to order
        itself right-to-left, and you don't need to specify that Mongolian
        characters turn themselves sideways when the text is running
        horizontally left-to-right.

    Logical vs. Physical Description

        In a purely physical layout scheme, each of these text layout
        properties would be given as an absolute: The inline progression of
        this run of English is top to bottom, its glyph orientation is 90
        degrees clockwise, its block progression is from right to left.

        Image: Diagram of vectors for rotated English
               <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/text-flow-vectors-rl.svg>

        However, because the
        interrelationships among these properties are realized in the author's
        mind and not in the system,
          * The author must manually intervene any time there is a script
            change.
          * If one of the three properties fails to take effect (because of
            the Cascade or lack of UA support), then the layout breaks and the
            text becomes unreadable.

        A better system would embed knowledge of different scripts' intrinsic
        characteristics and define style properties in terms of the
        relationships among them.

    Intrinsic Directionality and Orientation

        Each script has a characteristic writing direction, and each character
        in Unicode is assigned a directionality value based this
        characteristic. Unfortunately, Unicode currently only defines
        horizontal directionality even though vertical and bi-orientational
        scripts have a vertical directionality as well. For example, while
        English can go either top to bottom or bottom to top (since it doesn't
        have a vertical directionality), Japanese must only go from top to
        bottom, even in a left-to-right block progression. Mongolian also has
        top-to-bottom vertical directionality. Unlike Japanese however, it has
        no definite horiziontal directionality (just a canonical one for BIDI
        purposes).

    Script Classification by Directionality

        Scripts can be classified into three orientational categories:

        horizontal
               Scripts that have horizontal, but not vertical, directionality.
               Includes: Latin, Arabic, Hebrew, Devanagari

        vertical
               Scripts that have vertical, but not horizontal, directionality.
               Includes: Mongolian, Manchu

        bi-orientational
               Scripts that have both vertical and horizontal directionality.
               Includes: Han, Hangul, Yi, Ogham

        Bi-orientational scripts may be further classified by how their glyphs
        transform when switching orientations. CJK characters translate; they
        are always upright. Other scripts, such as Ogham and some variants of
        classical Yi, must be rotated.

    Logical Text Flow

    Implying Direction

        Scripts in their native orientation need no additional stylistic hints
        for proper layout: their inline progression and glyph orientation are
        both intrinsically mandated, so the style system can know by itself
        how to lay them out. *Directionality and glyph orientation overrides
        are not necessary and should not be used._*(In fact, using them
        degrades the system by creating a tangle of dependencies, as
        demonstrated in the section on the current version of CSS3 Text.)

        Scripts in a foreign orientation don't need directionality or glyph
        overrides either. They just need a few hints: whether to translate
        upright, or, if they're rotated sideways, which side is "up". Given
        that, the rules for laying out the text in its native orientation are
        enough to determine the inline progression and exact glyph
        orientation.

    Orienting by Block Progression

        For scripts in a non-native orientation, the natural inline text flow
        depends on the direction of line stacking: the text is most
        comfortably laid out as if the whole text block were merely rotated
        from the horizontal. For example, English text in vertical lines that
        stack from left to right will face with the glyphs' tops towards the
        left and the text direction running from bottom to top. The same text,
        by the same logic, would in a right-to-left line stacking context face
        right and flow within each line from top to bottom.

        Image: Diagram of text block rotation
               <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/text-flow-natural.svg>

        Note: Merely rotating the rendered text from a horizontal layout is
        not sufficient because while the primary script is horizontal, it may
        include some vertical text (such as Chinese) that would need to be
        laid out appropriately for vertical lines.

        Putting this logic into the style system is straightforward: define
        "up" for non-native glyphs to point to the beginning of the line
        stack, and the inline progression follows from that orientation. The
        glyph orientation and inline progression will thus adapt to whichever
        block progression happens to take effect.

        This layout scheme is most appropriate for dealing with text that has
        been turned on its side for layout purposes--as for page headers
        or captions or table headings. However, a major use case for laying
        out text in a non-native orientation is mixing horizontal and vertical
        scripts, which introduces the requirement of making the secondary
        scripts flow well in the context of the primary script.

        For example, a primarily Mongolian document, which has vertical lines
        stacking left to right, usually lays its Latin text with the glyphs
        facing the right. This makes the text run in the same inline
        progression as Mongolian and face the same direction it does in other
        East Asian layouts (which have vertical lines stacking right to left),
        but the glyphs are facing the _bottom_ of the line stack rather than
        the top, something they wouldn't do in a primarily-English paragraph.

        Image: Text Flow Vectors in Mongolian Text
               <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/mongolian-vectors.jpg>

        Yet another common layout is to keep the horizontal script's glyphs
        upright and order them from top to bottom; this is frequently done
        with Latin-script acronyms in vertical East Asian text.

        Image: <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/vertical-acronym.gif>

        To handle these layouts, the style system needs to offer controls for
        choosing among these different layout schemes. Note, however, that
        scripts in their native orientations do not need these hints; only the
        non-native ones do. Also, this is only one simple scheme switch here:
        there's no need for the designer to set separate absolute inline
        progression and glyph orientation controls or to set styling
        properties on each text run of a different script.

    The Three Switches of Logical Text Layout

        In summary, to lay out a block of arbitrary, mixed-script text, the
        layout system needs to offer only three controls:

          * primary script's directionality (BIDI property)
          * block progression direction (stylistic property)
          * glyph orientation scheme (stylistic property)

        Formalized into CSS syntax, this becomes:

        direction
               Primary directionality. Can take the following values

             ltr
                     Left-to-right directionality in horizontal text; No
                     inherent directionality in vertical text. (Horizontal
                     script) Examples: Latin, Tibetan

             rtl
                     Right-to-left directionality in horizontal text; No
                     inherent directionality in vertical text. (Horizontal
                     script) Examples: Arabic, Hebrew

             ttb
                     Top to bottom directionality in vertical text; No
                     inherent directionality in horizontal text. (Vertical
                     script) Example: traditional Mongolian

             ltr-ttb
                     Left to right directionality in horizontal text; Top to
                     bottom directionality in vertical text. (Bi-orientational
                     script) Examples: Han, modern Yi

             ltr-btt
                     Left to right directionality in horizontal text; Bottom
                     to top directionality in vertical text. (Bi-orientational
                     script) Example: Ogham

        block-progression
               Block progression (line stacking) direction. Can take the
               following values

             tb
                     Top-to-bottom line stacking (horizontal text). Typically
                     used for most non-East-Asian layout.

             rl
                     Right-to-left line stacking (vertical text). Typically
                     used for traditional CJK layout.

             lr
                     Left-to-right line stacking (vertical text). Typically
                     used for traditional Mongolian layout.

        text-orientation-vertical
               Glyph orientation scheme to use in vertical text. Can take the
               following values

             natural
                     Non-vertical script runs are laid out as if "up" was
                     towards the top of the line stack (left or right,
                     depending on the block progression in effect). (Vertical
                     scripts are laid out as vertical scripts.)

             left
                     Non-vertical script runs are laid out as if "up" was
                     towards the left side of the line stack. (Vertical
                     scripts are laid out as vertical scripts.)

             right
                     Non-vertical script runs are laid out as if "up" was
                     towards the right of the line stack. (Vertical scripts
                     are laid out as vertical scripts.)

             upright
                     Non-vertical scripts' characters read top to bottom, with
                     each grapheme cluster oriented upright. (Vertical scripts
                     are laid out as vertical scripts.)

               For handling vertical-only scripts in horizontal layout, a
               text-orientation-horizontal property is also necessary; it
               takes effect only when the block progression is top-to-bottom.
               To keep the discussion less verbose, I am relegating
               consideration of horizontal layout to an appendix.

        As long as the directionality is set correctly for the text (and it
        should be set automatically from the content/markup as long as the
        designer doesn't touch it later), any combination of the
        block-progression and text-orientation stylistic values will result in
        a correct (though perhaps not optimally-designed) text layout.

        The style system can thus handle most of the intricacies of laying out
        both usual and unusual combinations of text by itself. What it needs
        to do this, however, is to know the intrinsic properties of the
        characters and the logic of laying them out.

    Implementing A Logical Text Layout System

    Composing Lines of Text

        Handling block-progression is very straightforward: just stack the
        composed lines in the stacking direction. Composing the lines of text
        is more complicated. The text needs to go through three processing
        steps.

    Character Ordering

        Character ordering is where the BIDI algorithm gets applied. *The
        algorithm remains essentially unchanged when dealing with vertical
        text: what changes is the data.* Specifically, the directionality
        values of certain characters are mapped into the algorithm differently
        depending on the styling context.

        The Unicode Bidirectional (BIDI) Algorithm deals with two
        directions: left-to-right (towards right) and right-to-left (towards
        left), defined to be the same as the script directionalities involved.
        Although this multi-directional model has several more directionality
        values, the BIDI algorithm here still deals with only two directions:
        it just abstracts them so that they could just as easily be
        bottom-to-top (towards top) and top-to-bottom (towards bottom). To
        avoid the apparent absurdity of mapping right to left and such things,
        I will call the two BIDI directions "high" (H) and "low" (W).
        (Implementations, no doubt, will prefer to call them "left" and
        "right" to map directly into the Unicode BIDI algorithm.)

        It is important to keep in mind that these directions are abstract. We
        will map "left", "right", "top", and "bottom" to "high" or "low" based
        on the values of text-orientation and block-progression. *The mapping
        applies to everything: the individual character's directionality,
        embedding and override codes, the CSS direction values, HTML dir
        attributes, etc.* Once the line is composed, we then lock "high" and
        "low" to the appropriate sides of the block as we stack the lines
        according to block-progression.

    Directionality Mapping: Vertical Case

        In vertical context, bi-orientational scripts use their vertical
        directionality and behave as vertical, not horizontal, scripts. Han,
        for example, as a ltr-ttb script, is treated as ttb (top to bottom),
        _not_ ltr (left to right). The ltr-ttb value for direction is
        correspondingly treated the same way as the value ttb.

    For text-orientation: right (and text-orientation: natural in a
    right-to-left block progression):

          * Map ttb and ltr to htl (high to low)
          * Map btt and rtl to lth (low to high)

        Image: Diagram of 'right' Mapping
               <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/bidi-right.svg>

        Run the Unicode BIDI Algorithm with its "left" being our "high" and
        its "right" being our "low".

    For text-orientation: left (and text-orientation: natural in a left-to-right
    block progression):

          * Map ttb and rtl to lth (low to high)
          * Map btt and ltr to htl (high to low)

        Image: Diagram of 'left' mapping
               <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/bidi-left.svg>

        Run the Unicode BIDI Algorithm with its "left" being our "high" and
        its "right" being our "low".

    For text-orientation: upright

          * Map ttb, ltr, and rtl to htl (high to low)
          * Map btt to lth (low to high)

        Image: Diagram of 'upright' mapping
               <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/bidi-upright.svg>

        Run the Unicode BIDI Algorithm with its "left" being our "high" and
        its "right" being our "low".

    Glyph Orientation

        Before the system can paint the text (or even do alignment), it needs
        to know how to rotate the glyphs. For vertical and bi-orientational
        scripts, this is simply "rotate me to my intrinsic position". This
        doesn't mean "don't rotate me, I'm supposed to be upright", however,
        because *the standard representation of a character in a font is the
        one used in horizontal text*.

    Vertical Scripts

        Han and Kana and Hangul and Yi do need to be kept upright (0°
        rotation) because they use the same orientation in both horizontal and
        vertical text. Mongolian (and Ogham), however, rotate from one context
        to the other and so their glyphs must be rotated 90° from their
        horizontal orientation when used in vertical context. Part of the
        system's knowledge, therefore, needs to be which scripts need to be
        rotated and which merely translated into place. Given that and the
        script's directionality, the exact rotation can be derived as follows:

        System's Knowledge of Vertical Scripts' Properties -
                                  Han/Hangul/Kana/Yi Mongolian/Manchu Ogham
         (cannonical) horizontal
         directionality......... LTR (LTR) LTR
         vertical directionality TTB TTB BTT
         transformation translation rotation rotation

        System's Derivation of Vertical Scripts' Orientation
                                Han/Hangul/Kana/Yi Mongolian/Manchu Ogham
        horizontal orientation
        (vector direction)....
           inline progression 90° 90° 90°
           glyph orientation 0° 0° 0°

        transformation........
           inline progression rot 90° rot 90° rot -90°
           glyph orientation static rot (90°) rot (-90°)

        vertical orientation
        (vector direction)....
           inline progression 180° 180° 0°
           glyph orientation 0° 90° 270°

    Horizontal Scripts

        For horizontal scripts, the method is "rotate me according to the
        relevant text-orientation style".

    For text-orientation: right or text-orientation: natural in a right-to-left
    block progression:

        Rotate horizontal scripts' grapheme clusters 90° to the right.

        Image: Glyphs rotated right
               <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/glyph-right.svg>

    For text-orientation: left or text-orientation: natural in a left-to-right
    block progression:

        Rotate horizontal scripts' grapheme clusters 90° to the left.

        Image: Glyphs rotated left
               <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/glyph-left.svg>

    For text-orientation: upright

        Keep glyphs for horizontal scripts upright and stack grapheme clusters
        vertically.

        Image: Glyphs translated upright
               <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/glyph-upright.svg>

    Punctuation

        Transformations for punctuation, being somewhat arbitrary and
        stylistic, should be handled by using vertical glyph variants given in
        the font, but only when the direction of the text is a vertical or
        bi-orientational directionality or text-orientation-vertical is
        upright. (If the text is primarily horizontal text rotated sideways,
        then the punctuation should likewise be horizontal punctuation rotated
        sideways.)

    Character Shaping

        Character shaping is the process of selecting, based on context, which
        of several allographs of a letter should be used. This is typical of
        cursive scripts like Arabic and Mongolian, where the shape of a letter
        depends on whether it comes at the start of a word, in the middle of a
        word, or at the end of a word.

        Image: Diagram of Arabic shaping
               <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/shaping.svg>

        According to UAX 9, character shaping occurs _after_ BIDI reordering:
        the Arabic character shaped as an "initial" will always be on the
        right, even if the text is given a left-to-right override. This
        ensures that the letters always connect. (An initial form on the left
        side of the word would be trying to connect to nothing.)

        To deal with the multiple orientations of vertical layout, the shaping
        logic needs to know not just the reordered string of characters, but
        which side of the line is "up". If we turn the glyphs all upside-down,
        for instance, the shaping needs to be done in reverse.

        Image: Diagram of reverse (upside-down) shaping
               <http://fantasai.inkedblade.net/style/discuss/vertical-text/diagrams/reverse-shaping.svg>

        Because in vertical text Arabic and Mongolian can go in the same direction
        or in opposite directions, merely inverting the entire character string
        before passing it to standard Unicode shaping functions doesn't work.

        Shaping occurs only within each directional level run. Shaping is also
        constrained to runs of text in the same script; Mongolian characters,
        from Arabic's point of view, form as concrete a boundary as Latin ones
        do. It is therefore possible to break up the text into pieces that
        have characters from no more than one shaping-affected script without
        compromising the accuracy of the shaping. Then, for each run of text,
        one can use the shaping script characters' glyph orientation (derived
        above) to determine which way is "up" (0°) and hence which are the
        "left" (-90°) and "right" (+90°) sides of the text run. Once that's
        known the text run can be shaped, in reverse if necessary.

    Understanding Character Characteristics

        In addition to knowing the text, its primary directionality, and its
        styling properties, the implementation needs to know something about
        the characters themselves to be able to take advantage of the logical
        model. For each character, the following information must be available
        to the text layout algorithm:

          * horizontal directionality: ltr or rtl
            For vertical scripts this means the canonical directionality that
            is used for fonts and for plaintext horizontal layout.
          * vertical directionality: ttb, btt, or none
            For horizontal scripts, this is none.
          * glyph transformation between horizontal and vertical orientations:
            translation, rotation, or not applicable
            Only applies to scripts with vertical directionality.

        Unicode only specifies the horizontal directionality. For some scripts
        (not all), vertical directionality can be gleaned from the prose
        chapters describing the different writing systems. Glyph
        transformations are not given at all, only implied.

    Why and How the Unicode Consortium Should Be Involved

        The text layout model outlined in this document adds to the scope of
        the Unicode BIDI Algorithm and requires additional knowledge of
        character properties. This expansion should be part of Unicode rather
        than an alteration defined in CSS3. Standardizing it at the Unicode
        level rather than at the CSS level is more appropriate because
          * The Unicode Consortium has the expertise necessary to specify the
            character data correctly even for obscure scripts.
          * The extended data and algorithms operate at the same level as the
            existing Unicode specifications.
          * Non-CSS systems wanting to use this model will have a solid base
            to work off of rather than having to adapt bits of a high-level
            protocol (CSS3) to fit their application.
          * Standard Unicode APIs can be designed to handle the extended BIDI
            and shaping manipulations so that each application will not need
            to implement all of that itself.
          * Intermediary systems such as HTML can accomodate the model by
            building up from Unicode rather than down from CSS. (HTML would
            need the new direction values for its dir attribute.)
          * Unicode can add character-level support for
            vertical-directionality by defining directionality control codes
            to correspond with the vertical directionality requirements. This
            will allow the same plain text to be properly flowed into vertical
            layout contexts as well as horizontal ones.

    What happens if Unicode chooses not to standardize the additional character
    data

        I will be including the results of my personal research as a normative
        appendix to the next revision of CSS3 Text. Should the Unicode
        Consortium provides the necessary character data, I will publish a new
        version that removes the appendix and instead references the relevant
        sections of the Unicode Standard.

    About the Author and the Status of CSS3 Text

        I am an invited expert for the CSS Working Group at the World Wide Web
        Consortium (W3C) and the new editor of the CSS3 Text module. I intend
        to completely rewrite the Text Layout chapter for the next version of
        CSS3 Text based on the principles outlined herein.

    Acknowledgements

        Thanks go out to:

          * Martin Heijdra at the Gest Library, for his guidance, expertise,
            and enthusiasm. I had never imagined that the research staffer
            helping me find books would turn out to be an expert on
            international typography and Mongolian in particular.
          * Ian Hickson, the members of the www-style mailing list, the
            members of the CSS Working Group, and the contributors to the
            Mozilla Project for tempering my technical skills and CSS
            knowledge over the years
          * The CSS Working Group for giving me a chance to fix everything I
            found wrong in the CSS3 Text drafts.
          * Last, but not least, Håkon Wium Lie and Opera Software <http://www.opera.com/>
            for supporting me in creating this work, to the point of even _paying_
            me to finish it. I only wish it hadn't taken so long so that I could
            spend more time on QA. ;)

    Appendix: Vertical Scripts in Horizontal Flow
    <http://fantasai.inkedblade.net/style/discuss/vertical-text/appendix>

    ################################################################################

    -- 
    http://fantasai.inkedblade.net/contact
    


    This archive was generated by hypermail 2.1.5 : Wed Oct 20 2004 - 12:11:27 CST