Unicode Extends Chinese, Japanese, Korean (CJK)
Character Database
Version 4.0.1 of the Unicode® Standard Released
Mountain View, CA, March 31, 2004 — The Unicode® Consortium
announced today a new update of the Unicode Standard,
Version 4.0.1, a significant revision of its Unicode Character Database, widely
used in software products. No new characters are added to the standard at this
time—the total number of characters still stands at 96,382 for the world's
scripts and collections of symbols. However, the information in the Character
Database has been refined to improve the quality of text processing in all
languages of the world.
This version of the Unicode Character Database includes the first major
update of the CJK database (Unihan) in two years. The Unihan Database provides character properties, definitions,
pronunciations, mappings, and other information for the CJK characters in the
standard—the characters used in particular for Chinese, Japanese, and Korean.
This update includes thousands of additions and corrections, including major new correlations with traditional
Chinese and Japanese dictionary sources.
This version of the Unicode Standard significantly improves the ability to
interchange languages such as Arabic, Hebrew, Urdu, and
Pashto. It also clarifies the implementation of such languages as
Bengali and the relationship between base form letters and accent marks.
Full technical details regarding the Unicode Standard, Version
4.0.1 are published online at
http://www.unicode.org/versions/Unicode4.0.1/
.
The book version of the Unicode Standard, Version 4.0, which
Version 4.0.1 amends, was published by Addison-Wesley in
September of 2003 (ISBN 0-321-18578-1). For full information,
including the online edition and the book order form, see
http://www.unicode.org/versions/Unicode4.0.0/
.
About the Unicode Standard
The Unicode Standard provides a uniform architecture and encoding for all
languages of the world, with over 95,000 characters currently encoded. Unicode
is a fundamental component for providing seamless data interchange around the
world, and has been adopted by such industry leaders as Adobe, Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others.
Unicode is required by modern standards such as XML, Java, C#, ECMAScript
(JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement
ISO/IEC 10646. It is supported in many operating systems, all modern browsers,
and many other products. For additional information on Unicode or the Unicode
Consortium, please visit
http://www.unicode.org.
About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop,
extend and promote use of the Unicode Standard, which specifies the
representation of text in modern software products and standards. The consortium
works very closely with the INCITS L2 committee (http://incits.org/incits/tc_home/l2.htm)
and with ISO/IEC JTC 1 SC2.
The Unicode Standard is a major component in the globalization of e-business,
as the marketplace continues to demand technologies that enhance seamless data
interchange throughout companies' extended -- and often international -- network
of suppliers, customers and partners. Unicode is the default text representation
in XML, an important open standard being rapidly adopted throughout e-business
technology.
The membership of the consortium represents a broad spectrum of corporations
and organizations in the computer and information processing industry. Full
members (the highest level) are: Adobe Systems, Apple Computer, Basis
Technology, Government of India Ministry of Information Technology, Government
of Pakistan National Language Authority, HP, IBM, Justsystem, Microsoft, Oracle,
PeopleSoft, RLG, SAP, Sun Microsystems, Sybase.
Membership in the Unicode Consortium is open to organizations and individuals
anywhere in the world who support the Unicode Standard and wish to assist in its
extension and implementation. For additional information on Unicode, please
contact the Unicode Consortium (http://www.unicode.org/).