Devanagari

From: Aman Chawla (creativezeal@hotmail.com)
Date: Sun Jan 20 2002 - 00:43:30 EST


I would be grateful if I could get opinions on the following:

1. Which encoding/character set is most suitable for using Hindi/Marathi (both of which use Devanagari) on the internet as well as in databases, and why? In your response, please refer to: http://www.iiit.net/ltrc/Publications/iscii_plugin_display.html, particularly the following paragraphs:
"Many people hope that the standardization problem will get solved because of Unicode. However there is an issue of transmission efficiency. The transmission cost for Indian languages will be three times that of English! The real culprit being UTF-8. UTF-8 converts Unicode two-byte codes to byte sequence of one to four bytes. In the process they make sure that ASCII part of the Unicode is transmitted as single byte only. So for a language like English which uses only 0-127 part of the code there is no overhead. European languages use only a few character codes in the region 128-255 in addition to 0-127 part. So in the case of the Europian languages the transmission of this portion may incur some overhead say of the order of 10%.

In contrast to above cases Indian languages use no part of the code in region 0-127. Secondly Indian character codes occupy less than 127 codes for each language. So what could have been transmitted in one byte if one uses ASCII will be transmitted in a sequence of two to four bytes. This amounts to extra overhead of 200%!"

2. Related to question 1, what can be done to encourage/force the use of a standardised encoding for Devanagari on the Internet?

3. With reference to the previous question, can programs that convert the myriad Devangari encodings in use today to a standard encoding (question 1) be made freely available, and how?

4. Is there any search engine on the internet that maintains an up to date index of sites in Devanagari? If not, what can be done to encourage proprietary search engines to support Hindi? Google supposedly has a Hindi language option, but surprise, it's in Roman script! Several emails to them have elicited the response: "At the moment we don't support Devanagari..."

Thanks,

Aman Chawla



This archive was generated by hypermail 2.1.2 : Sun Jan 20 2002 - 00:37:24 EST