Compact Encodings of Unicode
Intended Audience: |
Software Engineers, Systems Analysts, Content Developers |
Session Level: |
Beginner, Intermediate |
This talk discusses ways to reduce the size of Unicode text in files and
protocols by choosing a compact encoding, optionally combined with
general-purpose compression.
Unicode is often perceived to be "too big", and to cause an increase in
text size compared to traditional codepages. Concerns are raised
especially for systems with limited connection bandwidth, e.g., dial-up or
long-range wireless networks, and for computers with small memory sizes,
like PDAs and cell phones.
There are several Unicode encodings available with different encoding size
characteristics. After an overview of UTF-8/16, SCSU, and BOCU-1, their
use in different environments is discussed and compared with
general-purpose compression and traditional codepages. Comparison numbers
are presented, based on the ICU implementation. Software support for the
encoding and compression schemes is discussed.
|