Dear Herr Stolz,
Thanks for your good introduction into TUSTEP as a Unicode savvy 
concordance program. In fact, there is little to add, but since you asked 
for somebody from the original culprits to step in, and since I am the one 
to be blamed for pushing TUSTEP towards the UCS, here we go.
The general principle of character encoding in TUSTEP is based on markup, 
concretely on script tagging. Thus, the Russian expression "Materialy 
seminara po Platona" (to reuse my article's example and retype it in a 
non-UCS-savvy mailer) is internally stored as #r+Materialy seminara po 
Platona#r-, in turn mappable to other markup schemes, e. g. <cyr>Materialy 
seminara po Platona</cyr>. This internal system has been more or less 
unchanged for the last thirty something years and is cast into concrete due 
to the vast amount of legacy data in existence.
On entering the system character strings are transformed from any of a 
number of character encodings into the ASCII+markup internal encoding and 
vice-versa on export. The UCS in its UTF-16 and UTF-8 incarnations figures 
prominently amongst the supported encoding schemes.
The import/export-mechanism covers the complete UCS, and the computing 
centre has an ongoing Japanese project which publishes on the web (cf. 
http://www.uni-tuebingen.de/cyberreligion/). The support for CJK is, 
however, rudimentary and does not include typesetting, ordering etc. TUSTEP 
offers full support (including support for combining diacritics) for the 
following scripts: Latin, Greek, Cyrillic, Hebrew, Arabic, Syriac 
(Estrangelo), Coptic and Devanagari. TUSTEP has experimental support for 
Armenian.
You can find more information in my short article 
http://www.uni-tuebingen.de/zdv/bi/bi99/bi997l1-unicode.html which you have 
already quoted. Please feel free to contact me on- or off-list if you need 
more information.
For clarity it should be said that I no longer work for Tübingen 
University's Computing Centre and that the views which are expressed in 
this post are thus those of an outside company, which may or may not 
coincide with those of the university.
                 Best regards,
                         Marc
>
*************************
Marc Wilhelm Küster
Saphor
XML and Internationalization
Fronländer 22
D-72072 Tübingen
Tel.: (+49) / (0)7472 / 949 100
Fax: (+49) / (0)7472 / 949 114
Mail: kuester@saphor.net
URL: http://www.saphor.net
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT