> Looks interesting. How are you approaching the complication that transliteration is between pairs of languages?
I know what you mean: Gorbachev is Gorbatschow in German.
I think that the rules that we have in ICU are probably English-centric where it makes a difference.
Note that some of the transliterator functions like uppercasing and any-name are just wrappers around Unicode functions, and so not language-dependent.
The strength of the API is that you can roll your own rules at runtime and at compile-time. If you have different rules for Finnish as a target language for transliteration, then you can modify the ICU rules or supply a whole different set for your own.
The rules are written somewhat similarly to regular expressions.
See the (draft, somewhat outdated) user guide chapter: http://oss.software.ibm.com/icu/userguide/Transliteration.html
and the API references: http://oss.software.ibm.com/icu/apiref/class_Transliterator.html and http://oss.software.ibm.com/icu/apiref/utrans_h.html
markus
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 13:48:07 EDT