Re: Unicode savvy concordance software?

From: Marc Wilhelm Küster (
Date: Sat Apr 14 2001 - 03:35:44 EDT

Dear Herr Stolz,

Thanks for your good introduction into TUSTEP as a Unicode savvy
concordance program. In fact, there is little to add, but since you asked
for somebody from the original culprits to step in, and since I am the one
to be blamed for pushing TUSTEP towards the UCS, here we go.

The general principle of character encoding in TUSTEP is based on markup,
concretely on script tagging. Thus, the Russian expression "Materialy
seminara po Platona" (to reuse my article's example and retype it in a
non-UCS-savvy mailer) is internally stored as #r+Materialy seminara po
Platona#r-, in turn mappable to other markup schemes, e. g. <cyr>Materialy
seminara po Platona</cyr>. This internal system has been more or less
unchanged for the last thirty something years and is cast into concrete due
to the vast amount of legacy data in existence.

On entering the system character strings are transformed from any of a
number of character encodings into the ASCII+markup internal encoding and
vice-versa on export. The UCS in its UTF-16 and UTF-8 incarnations figures
prominently amongst the supported encoding schemes.

The import/export-mechanism covers the complete UCS, and the computing
centre has an ongoing Japanese project which publishes on the web (cf. The support for CJK is,
however, rudimentary and does not include typesetting, ordering etc. TUSTEP
offers full support (including support for combining diacritics) for the
following scripts: Latin, Greek, Cyrillic, Hebrew, Arabic, Syriac
(Estrangelo), Coptic and Devanagari. TUSTEP has experimental support for

You can find more information in my short article which you have
already quoted. Please feel free to contact me on- or off-list if you need
more information.

For clarity it should be said that I no longer work for Tübingen
University's Computing Centre and that the views which are expressed in
this post are thus those of an outside company, which may or may not
coincide with those of the university.

                 Best regards,



Marc Wilhelm Küster
XML and Internationalization

Fronländer 22
D-72072 Tübingen

Tel.: (+49) / (0)7472 / 949 100
Fax: (+49) / (0)7472 / 949 114


This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT