Re: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))

From: Mark E. Shoulson (mark@kli.org)
Date: Fri May 14 2004 - 13:25:50 CDT

  • Next message: Antoine Leca: "Re: TR35"

    E. Keown wrote:

    > Elaine Keown
    > Tucson
    >
    >Dear Peter,
    >
    >
    >
    >>>>*plain text* standard is the bidirectional
    >>>>algorithm, which sorts out how a (horizontal)
    >>>>*line* of text is laid out when text of opposite
    >>>>directions
    >>>>
    >>>>
    >
    >In the 'old' Unicode 3.0 there was a one-line note on
    >doing boustrophedon near the bidi material.
    >Boustrophedon is needed not 'just' in Archaic Greek,
    >but also in some periods of Egyptian and in some early
    >Semitic stuff.
    >
    >For a small percentage of early Semitics stuff, it
    >would be convenient to be able to automatically
    >reverse the direction in a database, so the retrieval
    >algorithm could look at 'both directions.'
    >
    That shouldn't be a problem, not even an issue. Remember, no matter
    which direction the text runs on the page, Unicode text is stored in
    logical order, not visual order. So a huge text that happens to be
    rendered boustrophedon is still stored as a sequence of characters in
    reading order. So you don't need to "reverse" the direction of anything
    when you're searching. If you're looking for "herman", the letters will
    be in exactly that order no matter which line of the text it wound up on.

    >Is there a larger 'boustrophedon' note in Unicode 4.0?
    > Is there any interest in expanding the bidi algorithm
    >to definitely cover all possible RTL - LTR
    >boustropheda (plural?) ?
    >
    Boustrophedon is probable outside the scope of unmarked Unicode. Which
    is not as bad as it sounds. So far as a computer is concerned, text is
    a stream of characters, in logical reading order. None of this silly
    "lines" business, and reversing directions, even if some of the
    characters are newline characters. That doesn't mean anything in terms
    of how the data is stored. It's only when the data is *rendered* on a
    screen or on paper that the bidi algorithm takes over and dictates where
    to put the various marks. The bidi algorithm is enough of a headache as
    it stands, just trying to deal with RTL and LTR scripts and their
    possible coexistence on a single line. Boustrophedon is far too complex
    for it. Probably what you'd do is have some higher-level markup tag
    saying "Begin boustrophedon here..." which your renderer would know to
    interpret properly: as it breaks the text into lines, reverse every
    other one, etc etc... You'd have stuff like "<boust></boust>" tags or
    something equivalent. The same goes for all various possible variants
    of boustrophedon, and whatever other exotic directions happen.

    >The discussion so far on the list doesn't appear to me
    >to cover every possibility....my impression is that
    >there are probably sub-varieties of boustrophedon and
    >of the vertical material....sometimes individual
    >characters get re-aligned, turned a certain number of
    >degrees, and maybe sometimes they don't.
    >
    That's okay. Things like that are outside of plain Unicode's
    capabilities. Other standards (XML stuff, whatever) need to be
    developed to handle them.

    ~mark



    This archive was generated by hypermail 2.1.5 : Fri May 14 2004 - 13:26:30 CDT