UnicodeIUC19
Unicode Standard Conference Board Past Conferences Call for Papers Sponsors Showcase
Registration Accommodation Travel Program Talks and Papers Next Conference
Abstract

Case Study: Porting an NLP Application to Unicode

Nicolas Auclerc - ATR-SLT

Intended Audience: Manager, Software Engineer
Session Level: Beginner, Intermediate

TOPICS OF INTEREST:

Language processing issues with unicode data also could be: Migrating legacy applications to Unicode

Abstract:

Our natural language processing research group has taken the decision of adopting Unicode for all its data. Consequently in the process of rewriting one of our already existing applications, namely a graphical tool for tree-banking with parsing aids, we had to integrate the support of Unicode. Originally, the application in question was intended for English (1byte) and for Japanese (2-byte) only. The new specifications included a redesign as a client/server application. We "killed two birds with one stone" by using Java. This eased both the use of Unicode and the implementation of the client/server communication.

The introduction of Unicode allowed us to simplify existing C code on the server side because only 2-byte code had to be adapted to Unicode and the old one for 1-byte had just to be thrown away. The universality of the tools integrated in the server is a new feature implied by the use of Unicode: it is a valuable investment for the future, when we shall deal with Korean, another language that we intend to deal with in our research.

Another benefit of rewriting our application as a client/server application with Java is that all input methods on the client side now support Unicode. As a consequence, the interface has been alleviated of the task of managing character inputting and partly layout. Hence, the management of a new language like Korean for input/output does not necessitate a recompilation of the client code.


Unicode
When the world wants to talk, it speaks Unicode

UnicodeIUC19
Unicode Standard Conference Board Past Conferences Call for Papers Sponsors Showcase
Registration Accommodation Travel Program Talks and Papers Next Conference
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

22 Jun 2001, Webmaster