Re: locale-*independent* vi editor supporting UTF-8

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Mon Dec 07 1998 - 05:00:56 EST


Am 1998-12-03 um 7:46 h hat odonnell@zk3.dec.com geschrieben:
> According to the XPG4 vi man page, the current
> locale controls many aspects of vi's behavior, including the
> way strings are parsed into characters,
...
> Now, locales and encodings are two different things. POSIX.2,
> which defines the contents and syntax of locales, does not say
> anything about how characters are encoded, so it's perfectly fine
> to use UTF-8 as the encoding for any locale. And then for vi to
> operate under any UTF-8 locale.

UTF-8 (cf. <http://czyborra.com/utf/#UTF-8>) uses 1 through 3 bytes per BMP
character (1 through 4 bytes per Unicode character). In order to "parse
strings into characters", the processing program must undo the UTF-8
encoding. A program based on the 1-byte-amounts-to-one-character model
will not be able to sensibly handle UTF-8 encoded data.

Vi, as any other program, has to know about this encoding, in order to
perform correctly; a classical, 8-bit based, Vi implementation would not
even get the cursor position right, with UTF-8 encoded data. At the very
least, Vi will have to take the UTF-8 mechanism into account when counting
characters and calculating cursor movements. Hence, I cannot understand how
> It's easy to have a vi that processes UTF-8-encoded data.

In order to process data in various encodings, such as ISO 8859-1, UTF-8,
and Unicode (UTF-16), a programm has to know about the encoding of the
actual data. Hence, I cannot understand how a program, such as Vi, could
work with a locale that does not cover the encoding.

Please, explain.

Best wishes,
   Otto Stolz



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT