"H. Peter Anvin" wrote on 1999-01-12 19:41 UTC:
> > > I dropped kernel work on it after 1.3.0, when it looked like hpa was
> > > taking over. However, roughly speaking nothing happened afterwards,
> > > and the current situation is far from satisfactory.
> >
> > Same here.
>
> Yes, I think we've all rather suffered from the confusion of
> responsibility. I myself have been reluctant to go in because I felt
> I step on your toes...
Ah, thanks for making this clear. Another misunderstanding resolved,
because I had assumed that you had taken over.
I am *very* happy to hereby release any responsibility whatsoever that I
might still have had for the UTF-8 code in the Linux console driver to
you.
I will work on other UTF-8 fronts, e.g. the -fixed-*-iso10646-1 fonts
plus overall stimulation/coordination of various other Linux UTF-8
projects. Please consider yourself hereby to be fully in charge of UTF-8
and Unicode in the Linux console. Please do not worry about breaking any
backwards compatibility if this is necessary for a clean design, because
practically nobody is understanding or using the existing UTF-8 console
support today (and I have the impression that the keyboard support is
currently broken with nobody even noticing it).
Suggestions of what can be fixed rather quickly (may be even in 2.2?):
- Please make clear in the documentation or comments that you are
now maintaining the UTF-8 aspects of the console.
- Please remove the old ESC % 8 activation code for UTF-8. I had
introduced this only as a temporary hack since at that time the
now official ESC % G was not yet defined by ISO/ECMA. I hate to
see it being left in there for the next few decades in the name of
unnecessary compatibility ... :-)
- Please make sure that the ISO 2022 ESC codes switch both the console
display and the console keyboard simultaneously. This is also what
all other ISO 2022 terminal emulators (kermit, xterm) are expected
to do.
- Please add the three officially registered ISO 2022 ESC sequences
ESC % / G
ESC % / H
ESC % / I
as alternatives for ESC % G, but with the difference that when the
switch to UTF-8 was done with one of these three, no return with ESC % @
is possible (see <http://www.itscj.ipsj.or.jp/ISO-IR/>). These three
ESC sequences announce the three levels of ISO 10646-1, but since for
a terminal emulator the ISO 10646-1 implementation levels do not
make any difference, just handle all three sequences as synonyms.
It is nice to be able to permanently disable ISO 2022 for the remaining
session, such that accidental binary dumps can't switch into an
uncontrolled ISO 2022 state any more.
- At the moment, illegal UTF-8 characters are silently ignored.
I now believe that this is neither a good idea (makes debugging
more difficult) nor in conformance with ISO 10646-1. Illegal
UTF-8 sequences such as 0xfe, 0xff, and unexpected or missing
10xxxxxx sequences should be indicated by the REPLACEMENT CHARACTER
as specified in ISO 10646-1 section R.7 "Incorrect sequences of
octets: Interpretation by receiving devices" (see
<ftp://ftp.informatik.uni-erlangen.de/pub/doc/ISO/charsets/ISO-10646-UTF-8.html>).
- In my original code certain non-spacing Unicode some characters such as
the "zero-width no-break space" were completely ignored and did not
advance the cursor by one position. I now believe that this was a
bad idea. *Every* graphical Unicode character should advance the
cursor by exactly one cell in a VT100 emulator (we can't
handle two cell wide East Asian characters in VGA text mode), and
if the application wants to ignore a few characters such as U+FEFF
or U+200B-200D for output on a VT100 terminal as non-spacing ones,
than the application shall have the sole responsibility for removing
these characters, and not the terminal emulator. Everything else
just would make debugging and compatibility much more difficult.
This is also what I will suggest to the authors of other UTF-8
terminal emulators.
- Unicode introduces two new control codes that the console driver is
not handling at the moment. I suggest to handle them as follows:
U+2028 LINE SEPARATOR handle just like CR LF
U+2029 PARAGRAPH SEPARATOR handle just like CR LF LF
Most of these suggestions apply also to other VT100 terminal emulators
(kermit, xterm, etc.). May be I'll set up a web page covering such
suggested conventions for UTF-8 capable VT100 emulators.
> Yes, it's a mess right now. Especially with fbcon simulating VGA
> limitations and all... yuck. I'm sort of interested in what the
> KGIcon people have been up to, as well.
>
> I also believe the choice of ioctl()s as the setting mechanism for a
> lot of these things was really bad. It makes it hard to implement
> things out of the kernel where appropriate.
Agreed.
> Anyway, I think we need to decide what we want to do in 2.3 and do
> it. If that mean a rewrite from scratch, I'm still game...
Great.
Markus
-- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT