Re: Transliterator

From: Mark Davis (mark.davis@jtcsv.com)
Date: Thu Apr 28 2005 - 17:42:18 CST

  • Next message: N. Ganesan: "Virama based model - a note (was: Malayalam digit zero - an error)"

    I had sent the following a few days ago, but was having some email problems
    so it didn't make it through.

    ----
    Yes, there are many different transliteration schemes. ICU follows ISO 15919
    mostly (we had to fill in a few holes where the standard transcribed instead
    of transliterated). If you want to see an example, go to
    http://ibm.com/software/globalization/icu/demo/transform
    In the Input box, paste in:
    यूनिकोड क्या है?
    यूनिकोड प्रत्येक अक्षर के लिए एक विशेष नम्बर प्रदान करता है,
    चाहे कोई भी प्लैटफॉर्म हो,
    चाहे कोई भी प्रोग्राम हो,
    चाहे कोई भी भाषा हो।
    Set Source 1 to Any, and Target 1 to Latin, and hit the Transform button.
    You'll get in Output 2 the following:
    yūnikōḍa kyā hai?
    yūnikōḍa pratyēka akṣara kē li'ē ēka viśēṣa nambara pradāna karatā hai,
    cāhē kō'ī bhī plaiṭaphŏrma hō,
    cāhē kō'ī bhī prōgrāma hō,
    cāhē kō'ī bhī bhāṣā hō.
    If you set Source 2 to Any, and Target 2 to Latin, and hit Transform, then
    you'll get the text transformed back. (Or you can pick different other
    targets.
    How well this all renders is up to your browser and available fonts.
    ‎Mark
    ----- Original Message ----- 
    From: "John Hudson" <tiro@tiro.com>
    To: "Chetan Pandey" <chetanpandey@yahoo.com>
    Cc: <unicode@unicode.org>
    Sent: Monday, April 25, 2005 22:06
    Subject: Re: Transliterator
    > Chetan Pandey wrote:
    >
    > > [a + BAR ABOVE] for "aa" as in balm,
    > > [i + BAR ABOVE] for "ii" as in meat,
    > > [u + BAR ABOVE] for "uu" as in boot,
    > > [a + BAR ABOVE] for "aa" as in balm,
    > > [m + DOT ABOVE } for  M as in saMgiita
    >
    > > If someone can pls tell me what this Scheme is called and where it is
    > > represented in Unicode, I will be very grateful.
    >
    > There are two Latin transliteration systems for Hindi that use these
    characters, ISO 15919
    > (2001) and the United Nations standard (1977). These systems are very
    similar, but there
    > are differences in the transliteration of a few vowels and a couple of
    consonants. For
    > more information see this PDF:
    >
    >     http://transliteration.eki.ee/pdf/Hindi-Marathi-Nepali.pdf
    >
    >
    > Not all of the diacritics used in these transliteration systems are
    encoded in Unicode as
    > combined letter + mark combinations. For some of them you will need to use
    sequences of
    > base letters and combining marks.
    >
    > John Hudson
    >
    > -- 
    >
    > Tiro Typeworks        www.tiro.com
    > Vancouver, BC        tiro@tiro.com
    >
    > Currently reading:
    > A century of philosophy, by Hans Georg Gadamer
    > Q, by 'Luther Blissett'
    >
    >
    >
    ‎Mark
    ----- Original Message ----- 
    From: "Markus Scherer" <markus.icu@gmail.com>
    To: <unicode@unicode.org>
    Sent: Thursday, April 28, 2005 15:24
    Subject: Re: Transliterator
    > On 4/25/05, Chetan Pandey <chetanpandey@yahoo.com> wrote:
    > > I am trying to build a Java program that will convert Devanagari Input
    into
    > > the English Transliteration System...
    >
    > You might be able to use ICU, which has built-in transliteration
    > between all Indic scripts and Latin. If you need different rules, you
    > can supply your own rule set to ICU's Transliterator API.
    >
    > Try the Transform demo on
    > http://www-306.ibm.com/software/globalization/icu/chartsdemostools.jsp
    >
    > with Source 1 = Devanagari and Target 1 = Latin.
    >
    > Best regards,
    > markus
    >
    >
    >
    >
    


    This archive was generated by hypermail 2.1.5 : Thu Apr 28 2005 - 17:44:09 CST