Twenty-second International Unicode Conference

Transcoding: Beyond the Basics

Peter Edberg - Apple Computer, Inc.

Presented by: Deborah Goldsmith - Apple Computer, Inc.

Intended Audience:	Managers, Software Engineers, Systems Analysts
Session Level:	Beginner, Intermediate

Initially, conversion of text from one character encoding to another - and in particular between Unicode and other encodings - may seem straightforward: Just convert characters in one encoding into the equivalent character - if there is one - in another encoding. However, there are many subtle issues involved in deciding when characters in different encodings are equivalent:

Similar characters in different encodings may have a different range of meanings, or different properties. One encoding may use a single character to represent a range of meanings that another encoding uses two or three different characters to represent, or a particular character that is in both encodings may have different directional properties in the two different encodings.
Information that is explicit in one encoding may be implicit in another, where it may depend on context or other state.
The requirements for how close an equivalence is required often depend on the purpose of a particular transcoding operation, and may range from "identity" (subject to the caveats mentioned above), through canonical, compatibility, or other semantic equivalence, all they way to mere graphic similarity. Encoding conversions need to handle different types of equivalence to properly handle different types of clients.

This talks describes many such issues, and presents techniques for handling them.

When the world wants to talk, it speaks Unicode

International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

22 May 2002, Webmaster