What's New in Unicode 4.0Mark Davis - IBM Corporation
Unicode 4.0.0 is the newest major version of the Unicode Standard, including a significant update of its widely-used Unicode Character Database. Version 4.0 defines over 96,000 characters for the languages of the world, and provides detailed properties and algorithms for computer systems. The current release contains all the information needed to update software to support the latest characters. As a significant step towards the digital preservation of world heritage, this new version encodes characters for Linear B and other ancient Mediterranean alphabets. At the same time, it expands support for modern minority languages. This removes a major barrier that has prevented people from using their own languages on computers. The text of the standard and the Unicode Standard Annexes has undergone substantial revision. In particular, the Unicode Character Encoding Model is incorporated, resulting in fully specified definitions and conformance requirements of UTF-8, UTF-16, and UTF-32. These are also clearly contrasted with the in-process use of Unicode Strings. Other changes include program identifiers, bidi, linebreaking and other boundaries, case conversions and detection, and scripts. In this version, 1,226 new character assignments were made (over and above what was in Unicode 3.2). In the Unicode Character Database, this version introduces the concept of provisional properties, clarifies the relationships between properties, and provides precisely defined fallback properties for characters not explicitly defined in the data files. A number of corrections to properties were also incorporated, and the UCD documentation was combined and improved. This presentation discusses the changes to the text of the standard, and outlines the changes made in the Unicode Character Database and in the Unicode Standard Annexes. |
When the world wants to talk, it speaks Unicode |
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
to info@global-conference.com.
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission. 30 May 2003, Webmaster |