UnicodeIUC20
Program Showcase Registration Accommodation Travel Sponsors
Unicode Standard Conference Board Conference CD Last Conference Past Conferences Next Conference
Abstract

Design and Implementation of a Suite of Chinese Transcoders for Python 2

Thomas Emerson - Basis Technology Corporation

Intended Audience: Software Engineers, Content Developers
Session Level: Intermediate, Advanced

With the release of Python 2.0 in October 2000 Unicode strings became a fundamental datatype in the language. A new module, codecs, provides support for registering new encoding converters to transcode between Unicode and legacy encodings. Codecs are provided for the ISO 8859-n 8-bit encodings, but the Asian encodings are absent.

The Python Codecs project, , is underway to supplement the standard set of encodings with the legacy CJK encodings. This presentation describes the design and implementation of a single unified transcoding framework for a wide range of Chinese encodings, including:

  • EUC-CN, EUC-TW
  • HZ
  • ISO-2022-CN, ISO-2022-CN-EXT
  • GB 18030
  • Big 5 and its variants, including HKSCS
  • It is expected that this framework will scale to support other multibyte 8-bit Asian encodings.


    Unicode
    When the world wants to talk, it speaks Unicode

    UnicodeIUC20
    Program Showcase Registration Accommodation Travel Sponsors
    Unicode Standard Conference Board Conference CD Last Conference Past Conferences Next Conference
    International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

    Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

    22 September 2001, Webmaster