Migration to Supplementary Characters - Multilingual Databases in Hong Kong

Linus Toshihiro Tanaka - Oracle Corporation

Intended Audience: Software Engineers, Systems Analysts
Session Level: Beginner, Intermediate, Advanced

Until Unicode 3.0, many characters used in Hong Kong Special Administrative Region (Hong Kong S.A.R.) had to be handled using Private Use Area (PUA) of Unicode. According to Hong Kong Supplementary Character Set - 2001 (HKSCS-2001) specification, there are 1,686 characters of this kind. Since the interpretation of Unicode values in PUA may vary in different systems, it is difficult to achieve higher level of interoperability when using PUA.

With Unicode 3.1, 3.2 or 4.0, the situation is different now. Among 1,686 characters in HKSCS-2001 that required PUA when using Unicode 3.0, 1,651 characters are now in Unicode. With the exception of rarely used 35 characters, we can handle characters used in Hong Kong S.A.R. without using PUA. This is important especially when storing multilingual data in a database, so that characters used in Hong Kong S.A.R. are not misinterpreted as Japanese characters, for example.

To migrate character data used in Hong Kong S.A.R. from legacy monolingual systems to multilingual systems, we need to use supplementary characters (also known as surrogate characters) of Unicode.

It may take some time until many systems will be migrated, so we need to handle communications between legacy monolingual systems and multilingual systems that use supplementary characters of Unicode.

In this paper, I explain how characters used in Hong Kong S.A.R. are encoded in legacy monolingual systems, how these characters are mapped to/from Unicode, how to migrate character data from legacy monolingual systems to multilingual systems, and how communications are handled between monolingual and multilingual systems.

With the use of the methods explained in this paper, databases in Hong Kong S.A.R. can handle multilingual data including languages such as Japanese. Also, databases outside Hong Kong S.A.R. can handle characters used in Hong Kong S.A.R. at the same time as characters of other languages such as Japanese.