Twenty-second International Unicode Conference

Unicode for Encoding Indian Language Databases: A Case Study of Hindi and Kannada Scripts

Shalini Urs - University of Mysore

Intended Audience:	Content Developers, Managers, Technical Writers, Library and Information Professionals
Session Level:	Intermediate

Statement of Purpose:

Practical experiences of UNICODE implementation for Indian language are very few. The purpose of submitting this paper is to share our experiences in using UNICODE for encoding two of the Indian Languages- Hindi and Kannada and to debate and resolve some of the challenges and issues in the UNICODE implementation for multilingual multi script database applications.

Paper Description:

Any initiative at building multilingual databases of Indian Materials has to confront the challenges of encoding Indic scripts. Vidyanidhi- Indian Digital Library of Electronic theses, is an initiative with a mandate and a mission to build an online resource of theses submitted to Indian Universities. Vidyanidhi is conceived and implemented in two layers- the top layer of Metadata and the other layer of full text of theses. Given the enduring archiving goal of Vidyanidhi, UNICODE has been chosen as an encoding option for the multilingual and multi script database of metadata. As a pilot, UNICODE implementation for Hindi and Kannada scripts has been explored. Currently Vidyanidhi meta database has six hundred records in Hindi and five hundred records in Kannada. This paper demonstrates the practical implementation of UNICODE for Hindi and Kannada scripts in Microsoft platform and environment.

To set the context, a brief description of the Vidyanidhi Project, its vision, goals, objectives and strategies are presented. It is followed by a discussion of the Indian Languages and scripts. A technical narrative of the use of Microsoft platform- SQL, Windows 2000 server, Office XP for enabling UNICODE implementation for Hindi and Kannada scripts is provided. The paper concludes with a discussion of the specific issues and problems encountered in the entry, display and collation of Indic character sets in UNICODE.

When the world wants to talk, it speaks Unicode

International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

23 May 2002, Webmaster