New /etc/unicode POSIX system database

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Mon Nov 30 1998 - 17:15:33 EST


Kenneth Whistler wrote on 1998-11-30 20:12 UTC:
> This is under active consideration for a much revised and extended
> form of the Unicode Character Database data to accompany the release
> of the Unicode Standard, Version 3.0. However, do not expect it to
> simply be an additional field for the UnicodeData-X.Y.Z.txt file. The
> format and field content of that file have been fixed for long enough that
> there are multiple implementations out there that parse it with
> particular assumptions about its format. There is an ongoing discussion,
> but chances are that new data files will be introduced, with similar,
> but new formats, for additional information provided about characters
> in the future.

Excellent:

What would be extremely important is to define a file format for a
standard /etc/unicode database that we hopefully can soon expect to be
installed on every Unix/POSIX workstation. Many applications (editors,
debuggers, terminal emulators) will need a table that maps Unicode codes
into names. If I shift-click on a character A in xterm/emacs/etc., I
want to see a tiny window pop up that tells me whether A is "cyrillic
capital letter a" or "greek capital letter alpha". Such a table should
not be compiled into binaries, but it should be globally provided as a
database file by standard distributions, and ftp.unicode.org should
provide the latest update version whenever the standard is extended.

Just like /etc/services contains on any Unix machine a standard list of
all Internet port names, and /etc/protocols contains all protocol
identifiers, /etc/unicode should contain all character names, and
perhaps some auxiliary information as well (character category,
decomposition, etc.).

The UnicodeData-X.Y.Z.txt files are already very close to what I think /
etc/unicode should look like, but at least the Unicode 1.0 names should
be removed.

Markus

PS: There was a strange header in the last two postings:

  Reply-to: considered_harmful@unicode.org

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT