From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Wed Jan 29 2003 - 05:22:59 EST
Aditya Gokhale wrote:
> Hello Everybody,
> I had few query regarding representation of Devanagari
> script in Unicode
All your questions are FAQ's, so I'll just reference the entries which
answers them.
> (Code page - 0x0900 - 0x097F). Devanagari is a writing
> script, is used in Hindi, Marathi and Sanskrit languages. I
> have following questions -
Unicode has no code pages:
http://www.unicode.org/faq/basic_q.html#18
> 1. In Marathi and Sanskrit language two characters glyphs of
> 'la' and 'sha' are represented differently as shown in the
> image below -
> (First glyph is 'la' and second one is 'sha')
> as compared to Hindi where these character glyphs are
> represented as shown in the image below -
> (First glyph is 'la' and second one is 'sha')
Unicode encodes (abstract) characters, not glyphs:
http://www.unicode.org/faq/han_cjk.html#3
(This FAQ is in the Chinese/Japanese/Korean section because it is more often
raised for Chinese ideograms.)
> In the same script code page, how do I use these two
> different Glyphs, to represent the same character ? Is there
> any way by which I can do it in an Open type font and Free
> type font implementation ?
Unicode's requirements for fonts:
http://www.unicode.org/faq/font_keyboard.html#1
A few links to OpenType stuff:
http://www.unicode.org/faq/font_keyboard.html#4
> 2. Implementation Query -
> In an implementation where I need to send / process
> Hindi, Marathi and Sanskrit data, how do I differentiate
> between languages (Hindi, Marathi and Sanskrit). Say for
> example, I am writing a translation engine, and I want to
> translate a document having Hindi, Marathi and Sanskrit Text
> in it, how do I know from the code points between 0x0900 and
> 0x097F, that the data under perusal is Hindi / Marathi / Sanskrit ?
What you need here is some sort of language tagging:
http://www.unicode.org/faq/languagetagging.html
> I would suggest that we should give different code pages
> for Marathi, Hindi and Sanskrit. May be current code page of
> Devanagari can be traded as Hindi and two new code pages for
> Marathi and Sanskrit be added. This could solve these issues.
> If there is any better way of solving this, any one suggest.
Characters are encoder "per scripts", not "per languages":
http://www.unicode.org/faq/basic_q.html#17
> 3. Character codes for jna, shra, ksh -
>
> In Sanskrit and Marathi jna, shra and ksh are considered as
> separate characters and not ligatures. How do we take care of
> this ? Can I get over all views on the matter from the group
> ? In my opinion they should be given different code points in
> the specific language code page.
> Please find below the character glyphs -
Unicode encodes Indic analytically:
http://www.unicode.org/faq/indic.html#17
> thanks,
For more details about Devanagari in Unicode, see Chapter 9 of the Standard:
http://www.unicode.org/uni2book/ch09.pdf
_ Marco
This archive was generated by hypermail 2.1.5 : Wed Jan 29 2003 - 06:07:49 EST