>I noticed that Torsten's scheme assumes that " " and "-" are mutually
>exclusive separators, but this is not true for a handful of Tibetan
>characters that have sequences like " -" or "- " (see list in l_xx.txt). How
>are these cases handled?
I've used split(/[- ]/, ...) in Perl. This results in an empty string
word between ' ' and '-', which is encoded like any other word. It's
not optimal, but I didn't change it yet.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT