24th Internationalization and Unicode Conference

Common XML Locale Repository

Mark Davis - IBM Corporation

Intended Audience:	Software Engineers
Session Level:	Beginner, Intermediate, Advanced

In the internationalization arena, Unicode has provided a lingua franca for communicating textual data. But there remain differences in the locale data used for a variety of tasks, such as formatting dates and times according to the conventions of different languages. Many of those differences are simply gratuitous; all within acceptable limits for human beings, but resulting in different results. In many other cases there are outright errors.

Whatever the cause, the differences can cause discrepancies to creep into a heterogeneous system. This is especially serious in the case of collation (sort-order), where different collation causes not only ordering differences, but also different results of queries! That is, with a query of customers with names between "Arnold, James" and "Abbot, Cosmo", where different systems have different sort orders, very different lists will be returned.

The Common XML Locale Repository is a project for the exchange of culturally sensitive (locale) information used in application and system development, and to gather, store, and make available data generated in that format. The project is a joint effort among members of the Linux Application Development Environment (aka LADE) Workgroup of the Free Standards Group's OpenI18N (formerly known as Linux Internationalization Initiative or Li18nux) team.

This paper describes the goals and features of the Common XML Locale Repository project with a summary of the latest changes in project up to this point, and gives an overview of the XML format for locale data exchange, the current status of the Repository, the comparison of existing data from different platforms, and the process of vetting data to produce a unified set of locale data.

When the world wants to talk, it speaks Unicode

International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

30 May 2003, Webmaster