Re: Indic Devanagari Query

From: Keyur Shroff (keyur_shroff@yahoo.com)
Date: Wed Jan 29 2003 - 02:54:34 EST

Next message: Asmus Freytag: "Re: LATIN LETTER N WITH DIAERESIS?"

Previous message: Aditya Gokhale: "Indic Devanagari Query"
In reply to: Aditya Gokhale: "Indic Devanagari Query"
Next in thread: Asmus Freytag: "Re: Indic Devanagari Query"
Reply: Asmus Freytag: "Re: Indic Devanagari Query"
Reply: Aditya Gokhale: "Re: Indic Devanagari Query"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi Aditya,

--- Aditya Gokhale <aditya@cdacindia.com> wrote:
> I had few query regarding representation of Devanagari script in
> Unicode
> (Code page - 0x0900 - 0x097F). Devanagari is a writing script, is used in
> Hindi, Marathi and Sanskrit languages. I have following questions -
>
>
> In the same script code page, how do I use these two different Glyphs, to
> represent the same character ? Is there any way by which I can do it in
> an Open type font and Free type font implementation ?

Yes, it is certainly possible with OpenType font. Please note that FreeType
is not a font format but it is a rendering library used to rasterize
different kind of fonts including TrueType and OpenType fonts.

In an Opentype font, you can include all glyphs with alternate shapes and
then select one of them depending upon the script and language. Application
should specify script and language tag while sending character codes to the
opentype rendering library/engine. All substitution will be taken place
depending on the language and/or script selection. There should be a
default script in the font. Similarly there will be a default language for
that script which will be used as fallback language if application does not
specify which language to be used for processing.

From the list of alternate glyphs you may want to use the glyph for default
language for an entry in cmap table. This default glyph can be substituted
by alternate glyph depending upon the language specification. You have to
use GSUB table and write language dependent lookup for substitution.

>
> 2. Implementation Query -
> In an implementation where I need to send / process Hindi, Marathi
> and Sanskrit data, how do I differentiate between languages (Hindi,
> Marathi and Sanskrit). Say for example, I am writing a translation
> engine, and I want to translate a document having Hindi, Marathi and
> Sanskrit Text in it, how do I know from the code points between 0x0900
> and 0x097F, that the data under perusal is Hindi / Marathi / Sanskrit ?

Unicode is not divided into code pages. Unlike few old encodings there is
only one code page for entire Unicode standard. However, for better
readability and quick user reference the entire chart has been divided into
different sections which you might interpret as code pages.

> I would suggest that we should give different code pages for Marathi,
> Hindi and Sanskrit. May be current code page of Devanagari can be traded
> as Hindi and two new code pages for Marathi and Sanskrit be added. This
> could solve these issues. If there is any better way of solving this, any
> one suggest.

Unicode gives code points to script only and not language. In fact it is
not desirable to give code points to individual languages falling under the
same script. Also, Unicode encodes characters which have abstract meaning
and properties. Unicode does not encode glyphs. The shapes of glyphs shown
in the Unicode chart have been given just for convenience and not actually
represent the shapes to be used in the font. The shape of the glyph for a
Unicode character may vary from one font to another. Since it is already
possible to select proper glyph(s) depending upon language selection, this
scheme is suitable for all Indian languages.

>
>
> 3. Character codes for jna, shra, ksh -
>
> In Sanskrit and Marathi jna, shra and ksh are considered as separate
> characters and not ligatures. How do we take care of this ? Can I get
> over all views on the matter from the group ? In my opinion they should
> be given different code points in the specific language code page.
> Please find below the character glyphs -
>
> jna
> shra
> ksh

All of the above can be composed through following consonant clusters:
  jna -> ja halant nya
  shra -> sha halant ra
  ksh -> ka halant ssha

The point that the above sequences are considered as characters in some of
the Indian languages has merit. If there is demand from native speakers
then a proposal can be submitted to Unicode. There is a predefined
procedure for proposal submission. Once this is discussed with concerned
people and agreed upon then these ligatures can be added in Devanagari
script itself because Devenagari script represent all three languages you
mentioned namely Sanskrit, Marathi, and Hindi. Meanwhile you can write
rules for composing them from the consonant clusters.

Regards,
Keyur

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

Next message: Asmus Freytag: "Re: LATIN LETTER N WITH DIAERESIS?"
Previous message: Aditya Gokhale: "Indic Devanagari Query"
In reply to: Aditya Gokhale: "Indic Devanagari Query"
Next in thread: Asmus Freytag: "Re: Indic Devanagari Query"
Reply: Asmus Freytag: "Re: Indic Devanagari Query"
Reply: Aditya Gokhale: "Re: Indic Devanagari Query"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jan 29 2003 - 03:37:07 EST