Re: unicode and right-to-left

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Thu Jun 10 1999 - 04:57:57 EDT


Matan Ninio wrote on 1999-06-09 23:27 UTC:
> With all the impressive work that seems to go on in unicode and multilingual
> support, I haven't heard of anyone mention right to left (Bidi) support. If i
> remember the standard right, for full (or even lever 1) ISO 10646 support,
> Bi-directional should be supported.

I don't think the ISO 10646-1 standard talks a lot about bi-directional
support. In contrast to Unicode, ISO 10646-1 is unly a character table
without much added semantics.

> I use Hebrew, and as such this aspect of unicode is very important to me (and
> many other people around me.) is there any work done on this subject? Can I
> help this effort by testing or even writing code?

As far as Unicode for X11 and Linux is concerned:

We have now UTF-8 support in xterm, the VT100 emulator that is most
commonly used to talk to non-GUI applications. Xterm treats ISO 10646-1
characters with exactly the same simple VT100 semantics as it treats ISO
8859-1 characters. The only new thing that UTF-8 support brought so far
is that you can now have thousands of characters as opposed to only 194
with ISO 8859-X. This is what you need to use Latin, Greek, Cyrillic,
Georgian, Armenian, Mathematics, IPA, etc. simultaneously in emails,
software source code, etc. Xterm currently does not do any bi-di
support nor does it any combining characters or other rendering of
presentation forms, and it can currently only support mono-spaced fonts.
This limits the usefulness of the new UTF-8 support to probably a bit
more than half of the scripts supported by Unicode.

I have not yet been in contact with anyone who had a very clear and
sufficiently simple vision of how right-to-left support for xterm should
look like. There are at least two competing standards. Both ISO 6429 and
Unicode specify their own right-to-left mechanisms, and it is unclear to
me, which of these two standards is more appropriate in the context of a
ISO 6429 subset compliant VT100 terminal emulator such as xterm. It is
also not clear to me how the Unicode bidi algorithm, which I understands
works on rendering a stream of characters during output, applies to a
VT100 terminal with its numerous full-screen editing and cursor control
capabilities. There is also the alternative option that the bidi support
is done in the application software (which has to internally reverse
Hebrew strings) and output on xterms is then done only in classical
left-to-right mode. The advantage of this approach is that the author of
the editor has more detailed control over every aspect of right-to-left
support, the disadvantage is that right-to-left support will not be as
persuasive as if it were done in the terminal emulator, because simple
programs such as cat and ls are unlikely to be extended with bidi
functionality.

Only a very small fraction of Unix developers (namely those in Israel
and Arabic countries) has a personal need for right-to-left support.
Linux support is not market-driven, but itch-driven, that is whenever
some developer is unhappy with something and feels the itch, then it
will get fixed rather quickly.

The best approach to get right-to-left support into Linux and X11 is to
first of all educate a broad developer community about the vision of how
in detail this should look like. For instance, should xterm be modified
or will editors and other applications have to reverse the strings?
Should ISO 6429 or Unicode bidi be used? How would this work in detail?
The Unicode bidi standard was not written with VT100 terminal editing
semantics in mind, and the bidi parts of ISO 6429 are not the most
readable specification on the planet. Is either the Unicode or the ISO
6429 bidi semantics implemented in some existing VT100 terminal variant
that is very popular among Unix users in Israel and that we simply
should emulate? A well written paper or web page about these design
issues would be a big step forward. And then you will have to contact
numerous people and convince them that bidi support is an important and
good things. You will probably also have to provide significant patches
for code yourself, because non-commercial users who are not users of
Hebrew or Arabic themselves might be sympathetic to your cause, but
probably will not feel enough of an itch to implement it all alone.

ISO 6429 = ECMA-48 is freely available from

  http://www.ecma.ch/
  ftp://ftp.ecma.ch/ecma-st/e048-pdf.pdf

The Unicode bidi algorithm is on

  http://www.unicode.org/unicode/reports/tr9/

I don't know any literature on existing practice with bidi under Unix
and VT100.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT