Re: Unicode transliterations (and other operations)

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Jul 03 2001 - 13:00:25 EDT


> Looks interesting. How are you approaching the complication that transliteration is between pairs of languages?

I know what you mean: Gorbachev is Gorbatschow in German.

I think that the rules that we have in ICU are probably English-centric where it makes a difference.
Note that some of the transliterator functions like uppercasing and any-name are just wrappers around Unicode functions, and so not language-dependent.

The strength of the API is that you can roll your own rules at runtime and at compile-time. If you have different rules for Finnish as a target language for transliteration, then you can modify the ICU rules or supply a whole different set for your own.
The rules are written somewhat similarly to regular expressions.

See the (draft, somewhat outdated) user guide chapter: http://oss.software.ibm.com/icu/userguide/Transliteration.html
and the API references: http://oss.software.ibm.com/icu/apiref/class_Transliterator.html and http://oss.software.ibm.com/icu/apiref/utrans_h.html

markus



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 13:48:07 EDT