The proposed transliteration mechanism, while being quite flexible
already through the rule mechanism, suffers from the principal
weakness of having to the morphology of the underlying word.
For example, in Arabic ZDMG transcription, one can transliterate the
sequence [Xuwwx] (i.e. strong consonant - damma - waw + shadda - vowel)
in two ways: as [X 016B 0077 x] or as [X 0075 0077 0077 x], depending
on whether the first w represents the long vowel u or the consonant w
in the Arabic script, which is indiscernible from the Arabic script.
For correctly transcribing this, the system needs detailed knowledge
of Arabic noun and verb paradigms, which probably is beyond the scope
of rule-based transliteration in the ICU framework.
Now I do admit that this is a highly specialized case. I could imagine
similar cases in other language/script environments as well, however.
Unless one designs an extremely complicated ruleset, automatic
transliteration will not achieve 100% accuracy (which I don't know if
it's your goal) This goes well beyond the scope
of character-based transliteration, though.
Greetings
Philipp mailto:uzsv2k@uni-bonn.de
__________________________
With searching comes loss / And the presence of absence / The server, not found
This archive was generated by hypermail 2.1.2 : Thu Aug 02 2001 - 16:22:32 EDT