Unicode Character Database 5.0 Released
Mountain View, CA, July 18, 2006 -- The Unicode® Consortium announces the release of a significant update of its widely-used
Unicode Character Database (UCD). The new version, Version 5.0, defines more than 99,000 characters for the languages of the world,
and provides the detailed properties needed for computer software implementations. This latest level of the UCD contains all the
information needed to update software to support the characters and algorithms that are the foundation for all modern computer programs
— including the latest data for Unicode security mechanisms, collation, and locales.
For the first time, the Unicode Collation Algorithm (UCA) is released in
parallel with the UCD — both UCA Version 5.0 and UCD Version 5.0 are available
simultaneously, enabling default collation for all 99,000
characters. For more information on UCA 5.0, see
http://www.unicode.org/reports/tr10/.
Implementers are now able to more quickly update their
software to fully support minority languages, improved Indic
processing, and the newly published subset of most useful Chinese
characters for mobile and small applications, IICore.
Version 5.0 of the UCD opens the power of the Common Locale Data Repository (CLDR) Version 1.4 —
360 locales (121 languages and 142 territories) are now supported. The systemization and extension of
character properties will enable improved text processing for all CLDR locales.
In this latest version, the UCD data provides dependable caseless matching through stable case
folding operations. Version 5.0 data is also the basis for better interoperability for bidirectional
scripts (such as Arabic and Hebrew), line breaking, and text segmentation. With the release of this data,
implementers can begin to move their software to Version 5.0, in anticipation of the Unicode Version 5.0
support that will ship with many products and libraries, including Windows Vista, ICU, offerings from
Google, Yahoo! and many other companies.
The release of Version 5.0 of the UCD is the first step in the release of The Unicode Standard, Version 5.0
— the book (ISBN 0-321-48091-0) will be published by Addison-Wesley in the fourth quarter of 2006. For more information
about Unicode 5.0 and the Unicode Character Database, see
http://www.unicode.org/versions/Unicode5.0.0/.
The latest features of Unicode Version 5.0 will also be showcased at the 30th Internationalization and Unicode
Conference (IUC) on November 17-19, 2006 in Washington, D.C. -- see
http://www.unicodeconference.org/.
About the Unicode Consortium
The
Unicode Consortium is a non-profit organization founded to develop,
extend and promote use of the Unicode Standard and related
globalization standards. The membership of the consortium represents
a broad spectrum of corporations and organizations in the computer
and information processing industry: Adobe Systems, L'Agence
intergouvernementale de la Francophonie, Apple Computer, Basis
Technology, Denic e.G., Google, Government of India - Ministry of
Information Technology, Government of Pakistan - National Language
Authority, HP, IBM, Justsystem, Microsoft, Monotype Imaging, Oracle,
SAP, Sun Microsystems, Sybase, The University of California at
Berkeley, Yahoo, plus well over a hundred Associate, Liaison, and
Individual members.
For more information, please contact the Unicode Consortium
http://www.unicode.org/.