Unicode for Encoding Indian Language Databases: A Case Study of Hindi and Kannada Scripts
Shalini Urs - University of Mysore
Intended Audience: |
Content Developers, Managers, Technical Writers, Library and Information Professionals |
Session Level: |
Intermediate |
Statement of Purpose:
Practical experiences of UNICODE implementation for Indian language are very
few. The purpose of submitting this paper is to share our experiences in using
UNICODE for encoding two of the Indian Languages- Hindi and Kannada and to
debate and resolve some of the challenges and issues in the UNICODE
implementation for multilingual multi script database applications.
Paper Description:
Any initiative at building multilingual databases of Indian Materials has to
confront the challenges of encoding Indic scripts. Vidyanidhi- Indian Digital
Library of Electronic theses, is an initiative with a mandate and a mission to
build an online resource of theses submitted to Indian Universities. Vidyanidhi
is conceived and implemented in two layers- the top layer of Metadata and the
other layer of full text of theses. Given the enduring archiving goal of
Vidyanidhi, UNICODE has been chosen as an encoding option for the multilingual
and multi script database of metadata. As a pilot, UNICODE implementation for
Hindi and Kannada scripts has been explored. Currently Vidyanidhi meta database
has six hundred records in Hindi and five hundred records in Kannada. This
paper demonstrates the practical implementation of UNICODE for Hindi and Kannada
scripts in Microsoft platform and environment.
To set the context, a brief description of the Vidyanidhi Project, its vision,
goals, objectives and strategies are presented. It is followed by a discussion
of the Indian Languages and scripts. A technical narrative of the use of
Microsoft platform- SQL, Windows 2000 server, Office XP for enabling UNICODE
implementation for Hindi and Kannada scripts is provided. The paper concludes
with a discussion of the specific issues and problems encountered in the entry,
display and collation of Indic character sets in UNICODE.
|