Teradata's Phased Implementation of Unicode
Intended Audience: |
Software Engineer, Data Base Administrator |
Session Level: |
Intermediate |
Purpose
Describe a successful phased approach to Unicode support mplemented for the
Teradata very large database engine.
Description
In 1984, Teradata produced the first commercial highly parallel database engine.
Since that time, many features have been added, including increasing
internationalization support. When the decision was made to support Unicode,
the database engine code was numbered in the millions of lines, much of it
assuming particular representations for character data. It was clear that this
was no easy task. In order to avoid a resource and risk bottleneck that would
have been infeasible for a single release, a phased approach was employed. Each
phase could be justified on its own merit, and acted as the groundwork for the
phase to follow. Phase 1 of the Unicode implementation involved the
construction of a Unicode datatype internal to the database engine. The
byproduct and justification was the ability to interchange Japanese data
between heterogenious clients with differing encodings for Japanese. Phase 2
constructed a method for defining external single byte characters, allowing
appropriate character support for almost every nation in the world. Phase 3
externalizes the Unicode support, banishing the need for translation of incoming
user data. Phase 4 regularizes Unicode support throughout the database engine,
allowing fully internationalized object names. While the precise divisions
between the phases may differ when planning Unicode support for other large
existing applications, the Teradata experience provides a model for such
efforts.
Conclusion
A phased approach to Unicode support within a large existing application offers
an attractive alternative to a potentially overwhelming attempt to offer full
support within the span of a single release.
|