LINE/PAGE SEPARATOR semantics in terminal emulators

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Mon Nov 02 1998 - 18:20:54 EST


Mark Davis wrote on 1998-11-02 19:10 UTC:
> http://www.unicode.org/unicode/reports/techreports.html
> UTR #13: Unicode Newline Guidelines

OK, let's open another highly interesting can of worms here:

How are terminal emulators supposed to handle the LINE SEPARATOR
(U+2028) and PARAGRAPH SEPARATOR (U+2029) characters?

A number of classical VT100-style terminal emulators (xterm, kermit,
Linux console) are right now being extended to support UTF-8, or have
already been extended. As far as I know, no special consideration is
currently given to LS and PS in these terminal emulators. At least I
haven't when I added UTF-8 to the Linux console, and after reading UTR
#13, I suddenly feel some urge to fix this properly. But how?

On platforms such as Unix, the semantics of plain-text files is closely
related to the semantics of the terminal (emulator). If I type "cat
filename", then I could the file to be displayed in some adequate way
for the last 25 years, and it is certainly not a bad idea to preserve
this in the future.

I think, we should get a few brains together and write a Unicode
technical report about VT100/ISO 6429 terminal emulators and what has to
be considered when they are extended to support Unicode/UTF-8.

I suggest that such a technical report should roughly specify the
following cursor control actions to be associated for these control
characters (talking about left-to-right mode here only at the moment):

CR: column=1
LF: column=1, row++
FF: column=1, row++
LS: column=1, row++
PS: column=1, row+=2

Representing a PAGE SEPARATOR as an empty line when I dump a plain text
file to a terminal emulator seems to me to be a quite acceptable
convention. A PAGE SEPARATOR could also nicely be represented in text
processors (emacs, vi, etc.) by an empty line over which the cursor
always jumps. FF could also be represented in a more fancy way as an
empty line with some page boundary indication visible in it (the HTML
equivalent would be <HR>).

Opinions?

Another issue that a Unicode terminal emulation report should address
(and I am not very familiar with this area) is the relationship between
the Unicode bidi algorithm and the ISO 6429 bidi algorithm in these
applications. shall both be implemented, or only one, and if both, how
do they interact?

I can also think about some good requirements for cut&paste functions in
terminal emulators, such that when I cut&paste text from a xterm to an
editor that control sequences such as PS and LS are not lost.

Many of the still open Unicode semantics for terminal emulators is also
still open for simple text processors (emacs, vi, etc.) used for writing
program source code, etc.

Who would be interested in doing some pioneer work in these topics?

> UTR #16: EBCDIC-Friendly UCS Transformation Format

Scary. No Halloween joke? ;-)

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT