RE: converting devanagari to mangal unicode

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Tue Dec 17 2002 - 09:07:56 EST

  • Next message: Jungshik Shin: "Re: Precomposed Tibetan"

    Bob Hallissy wrote:
    > NB: One of the complexities you may run into, and which will limit your
    > options, is that your encoding may store text in a different order than
    > Unicode requires. If this is the case, TECkit can do the rearrangement for
    > you but I'm not sure ICU will easily do that. Certainly the current
    > standard for XML-based descriptions of encoding mappings as given in
    > Unicode Technical Report 22 (see
    > http://www.unicode.org/unicode/reports/tr22/ ) cannot express such
    > mappings.

    Someone made me notice recently that UTR#22 can indeed implement Indic
    visual-to-logical mappings, provided that one chooses the whole Indic
    "syllable" as a mapping unit. E.g.:

            <a b="69 73 6B 27" u="0930 094D 0938 094D 0915 093F" c="र्स्कि" />
            <!-- matraI+halfSa+Ka+Repha = Ra+Virama+Sa+Virama+Ka+matraI -->

    Of course, this requires very big tables, which could be avoided using a
    smarter mechanisms. Moreover, it only works with well-formed sequences in an
    anticipated set of languages, but fails with misspellings or new
    orthographies.

    _ Marco



    This archive was generated by hypermail 2.1.5 : Tue Dec 17 2002 - 09:43:52 EST