Re: Indic Devanagari Query

From: Keyur Shroff (keyur_shroff@yahoo.com)
Date: Wed Jan 29 2003 - 02:54:34 EST

  • Next message: Asmus Freytag: "Re: LATIN LETTER N WITH DIAERESIS?"

    Hi Aditya,

    --- Aditya Gokhale <aditya@cdacindia.com> wrote:
    > I had few query regarding representation of Devanagari script in
    > Unicode
    > (Code page - 0x0900 - 0x097F). Devanagari is a writing script, is used in
    > Hindi, Marathi and Sanskrit languages. I have following questions -
    >
    >
    > In the same script code page, how do I use these two different Glyphs, to
    > represent the same character ? Is there any way by which I can do it in
    > an Open type font and Free type font implementation ?

    Yes, it is certainly possible with OpenType font. Please note that FreeType
    is not a font format but it is a rendering library used to rasterize
    different kind of fonts including TrueType and OpenType fonts.

    In an Opentype font, you can include all glyphs with alternate shapes and
    then select one of them depending upon the script and language. Application
    should specify script and language tag while sending character codes to the
    opentype rendering library/engine. All substitution will be taken place
    depending on the language and/or script selection. There should be a
    default script in the font. Similarly there will be a default language for
    that script which will be used as fallback language if application does not
    specify which language to be used for processing.

    From the list of alternate glyphs you may want to use the glyph for default
    language for an entry in cmap table. This default glyph can be substituted
    by alternate glyph depending upon the language specification. You have to
    use GSUB table and write language dependent lookup for substitution.

    >
    > 2. Implementation Query -
    > In an implementation where I need to send / process Hindi, Marathi
    > and Sanskrit data, how do I differentiate between languages (Hindi,
    > Marathi and Sanskrit). Say for example, I am writing a translation
    > engine, and I want to translate a document having Hindi, Marathi and
    > Sanskrit Text in it, how do I know from the code points between 0x0900
    > and 0x097F, that the data under perusal is Hindi / Marathi / Sanskrit ?

    Unicode is not divided into code pages. Unlike few old encodings there is
    only one code page for entire Unicode standard. However, for better
    readability and quick user reference the entire chart has been divided into
    different sections which you might interpret as code pages.

    > I would suggest that we should give different code pages for Marathi,
    > Hindi and Sanskrit. May be current code page of Devanagari can be traded
    > as Hindi and two new code pages for Marathi and Sanskrit be added. This
    > could solve these issues. If there is any better way of solving this, any
    > one suggest.

    Unicode gives code points to script only and not language. In fact it is
    not desirable to give code points to individual languages falling under the
    same script. Also, Unicode encodes characters which have abstract meaning
    and properties. Unicode does not encode glyphs. The shapes of glyphs shown
    in the Unicode chart have been given just for convenience and not actually
    represent the shapes to be used in the font. The shape of the glyph for a
    Unicode character may vary from one font to another. Since it is already
    possible to select proper glyph(s) depending upon language selection, this
    scheme is suitable for all Indian languages.

    >
    >
    > 3. Character codes for jna, shra, ksh -
    >
    > In Sanskrit and Marathi jna, shra and ksh are considered as separate
    > characters and not ligatures. How do we take care of this ? Can I get
    > over all views on the matter from the group ? In my opinion they should
    > be given different code points in the specific language code page.
    > Please find below the character glyphs -
    >
    > jna
    > shra
    > ksh

    All of the above can be composed through following consonant clusters:
      jna -> ja halant nya
      shra -> sha halant ra
      ksh -> ka halant ssha

    The point that the above sequences are considered as characters in some of
    the Indian languages has merit. If there is demand from native speakers
    then a proposal can be submitted to Unicode. There is a predefined
    procedure for proposal submission. Once this is discussed with concerned
    people and agreed upon then these ligatures can be added in Devanagari
    script itself because Devenagari script represent all three languages you
    mentioned namely Sanskrit, Marathi, and Hindi. Meanwhile you can write
    rules for composing them from the consonant clusters.

    Regards,
    Keyur

    __________________________________________________
    Do you Yahoo!?
    Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
    http://mailplus.yahoo.com



    This archive was generated by hypermail 2.1.5 : Wed Jan 29 2003 - 03:37:07 EST