Re: bidi support for xterm

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Tue Aug 17 1999 - 11:06:16 EDT


Markus Kuhn wrote:
> Frank da Cruz wrote on 1999-08-16 17:15 UTC:
> > If xterm is to handle Hebrew or other RTL writing systems, it should do so
> > in the same way as the terminal it is emulating, because it's a terminal
> > emulator.
>
> I might to some degree agree with your conclusion for Hebrew (because
> RTL scripts are perhaps too big a hassle in general), but I do not agree
> with the more general idea that a terminal emulator should only do what
> historic 1970s terminals did. There is a real need to extend and
> modernize the dump VT100/ISO 6429 terminal semantics, which has proven
> to be highly practical and useful, for at least some subset of Unicode,
> and I personally believe that this is very feasible for
>
> - UTF-8 (one or more bytes = 1 character)
> - Asian wide characters (one character = two char cells)
> - combining characters (up to say 3 characters = one char cell)
>
> all handled according to strictly standardized algorithms that the host
> application can predict easily.
>
Agreed -- obviously character-set extensions are needed and in fact have
been in place in Kermit for years -- see, for example:

  http://www.columbia.edu/kermit/kanji.html

circa 1992, which shows single-and-double width Roman and Kanji/Kana
characters on a plain old DOS screen (the screen shots were shrunk using xview
and some detail was lost, but you get the idea).

My point was that a terminal is a device that is controlled only by the host
(and the user's fingers), so the host and the human user have complete control
over the appearance of the screen at any moment. Any "intelligence" that we
build into a terminal that invalidates the host's knowledge of exact contents
the screen makes it something else, not a terminal. More like a Web browser.

The protocol between the terminal and the host must be well-defined. VT100,
VT102, VT220, and so on are well-defined. Any emulator of one of these
terminals must be compatible with its specification.

Nowadays, when actual physical terminals are hard to find, the majority of
commercial and shareware terminal emulators are written without reference to
a real terminal, or even to its specification -- more by guesswork, copying
other emulators, etc, and this is a bad situation because now host-based
applications are starting to appear that claim to use (say) a VT100 when in
fact they are based on the imperfect emulation of "FooCom 2.1" and therefore
do not work with a real VT100, or with other emulators.

Despite the passing of physical terminals, the terminal-host mode of
communication is more important than ever in terms of sheer numbers. We tend
to ignore it because it's done and it works so well, and focus all our
attention on Web browsers. But yes, the terminal-host model can use some
updating for recent developments, especially Unicode, in several distinct
scenarios:

 1. Traditional non-Unicode host (VMS, UNIX, IBM mainframe, etc) and
    a Unicode-based terminal emulator. In this case the emulator must use
    Unicode fonts to make the screen look like it would have with the real
    terminal. So far this is not possible due to missing characters that
    were proposed here last year. See:

      http://www.columbia.edu/kermit/standards.html

    (The Unicode proposals are at the bottom).

 2. Hosts that use Unicode (e.g. UTF-8) in the host-terminal data stream,
    such as Plan 9 and (soon) Linux.

For the latter scenario, first let's separate code issues from presentation
ones. Just as we currently support many character sets not supported by the
original terminals, we can support UTF-8 easily enough with our octet-oriented
communications gear and software, conveniently skirting issues of endianness,
and convert between it and other character sets when needed.

BIDI, combining characters, and so forth are separate issues. In the Kermit
project, as much as we would like to be out front, we are small in number and
can barely keep up with current demands. For Hebrew, this means supporting
current host-terminal applications such as ALEPH, Hebrew Vi, HEDT, and so
forth, which use current capabilities of well-defined terminals: ISO 2022
charset designation and invocation, direct cursor placement, and (for VT220
and above) switching of screen-writing direction. As is appropriate for the
terminal-host model, these applications control the placement of characters
on the screen.

Combining characters are a bridge we will cross when we come to it. As yet
we are unaware of any Unicode-based terminal-host application that uses them.

Can a new terminal be designed that behaves more like a Web browser, allowing
the host application to simply send a Unicode data stream, with the terminal
handling the BIDI presentation issues? Sure, but we would have to think
carefully about the implications. Would it still be a terminal? Only if the
host was still able to know the exact contents of every cell on the screen at
each moment, which is required for all but the simplest TTY scrolling
applications. Tables still need to line up, EMACS and Vi still need to work,
arrow keys and Tab need to work in fullscreen data-entry applications, and on
and on...

It might be possible but I worry that the specification would be so
complicated that even if we implemented it in an "emulator", the corresponding
host applications might be too difficult to create. This is very much an
"if we build it they will come" scenario.

On the other hand, automatic handling of BIDI by the terminal would be an
enormous boon to applications such as plain-text email, netnews, etc, which
until now has been next to impossible for Hebrew. So here's a tiny germ of an
idea: an ISO 6473 escape sequence to switch the terminal's BIDI handling
on/off. A terminal might be in BIDI mode by default (so normal shell-level
commands and text display would "just work" with Hebrew), and would be
switched out of it by any application like EMACS that needed precise control
of the screen contents, and then restored upon exit to its previous state.
Thus the state would have to be queriable, etc.

I've limited the discussion to Hebrew only because I know next to nothing
about Arabic or other RTL writing systems.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT