Migrating Software to Supplementary Characters
Intended Audience: |
Managers, Software Engineers, Systems Analysts, Technical Writers |
Session Level: |
Intermediate, Advanced |
Until recently, it was not necessary for software to deal with
supplementary code points, those from U+10000 to U+10FFFF. With the
assignment of over 40,000 supplementary characters in Unicode 3.1 and the
definition of new national codepage standards that map to these new
characters, it is important to modify BMP-only software to handle the full
range of Unicode code points.
Typically, only a small percentage of code needs to be changed. This
affects mostly low-level handling of 16-bit code units and data structures
containing per-character data.
This presentation discusses the changes required to handle all of Unicode
vs. just the BMP subset, concentrating on 16-bit Unicode -- the most common
processing form. It describes techniques for finding the small percentage
of code that typically needs to be changed, and shows how to modify such
code. Detailed examples for Java and C/C++ use the many helper functions
from ICU to illustrate practical solutions.
|