From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Tue Dec 17 2002 - 09:07:56 EST
Bob Hallissy wrote:
> NB: One of the complexities you may run into, and which will limit your
> options, is that your encoding may store text in a different order than
> Unicode requires. If this is the case, TECkit can do the rearrangement for
> you but I'm not sure ICU will easily do that. Certainly the current
> standard for XML-based descriptions of encoding mappings as given in
> Unicode Technical Report 22 (see
> http://www.unicode.org/unicode/reports/tr22/ ) cannot express such
> mappings.
Someone made me notice recently that UTR#22 can indeed implement Indic
visual-to-logical mappings, provided that one chooses the whole Indic
"syllable" as a mapping unit. E.g.:
<a b="69 73 6B 27" u="0930 094D 0938 094D 0915 093F" c="र्स्कि" />
<!-- matraI+halfSa+Ka+Repha = Ra+Virama+Sa+Virama+Ka+matraI -->
Of course, this requires very big tables, which could be avoided using a
smarter mechanisms. Moreover, it only works with well-formed sequences in an
anticipated set of languages, but fails with misspellings or new
orthographies.
_ Marco
This archive was generated by hypermail 2.1.5 : Tue Dec 17 2002 - 09:43:52 EST