Unicode Support in Oracle9i Database: New Unicode Features
Intended Audience: |
Manager, Software Engineer, Systems Analyst, Marketer |
Session Level: |
Intermediate |
As global ebusiness continues its growth into every aspects of industry
as a fundamental infrastructure for information and business management,
it is becoming crucial to develop internal applications with
multilingual capabilities. Unicode is widely accepted as a standard
across many platforms in providing this multilingual capability.
Oracle has supported Unicode in the UTF-8 encoding since Oracle7. In the
newest Oracle release Oracle9i, this support is further enhanced to
better serve the global ebusiness needs. Codepoint semantics is
introduced for text data so that UTF-16 semantics can be built upon
UTF-8 encoding which will easily support application server that is
built upon UTF-16. The benefit of this solution is to reduce the
migration effort and to increase storage efficiency for Latin data.
A Unicode data type is introduced to build Unicode application
independent of database character set. This data type enables existing
application to be gradually migrated into Unicode and it offers the
choice of either UTF-8 or UTF-16 as its encoding for more storage
efficiency based on data distribution.
This paper describes the functionality of codepoint semantics and the
new Unicode data type in the Oracle9i database release. Design choices,
such as codepoint or byte semantics, Unicode Database or Unicode data
type, and UTF-8 versus UTF-16, will be discussed. A brief description
of the new Unicode access interface will be given to round out the
complete multilingual capabilities offered throughout the Oracle
Development Platform.
|