Re: Private Use Surrogate Pairs (128x1024 - 4)

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Mon May 13 2002 - 06:31:22 EDT


The 128 is from all 128 possible permutations of 0 and 1 as the seven least
significant bits of any high surrogate in the range U+DB80 through to
U+DBFF. The 1024 is from all possible permutations of the ten least
significant bits of any low surrogate in the range U+DC00 through to U+DFFF.

Consider any of the 128 high surrogates and any of the 1024 low surrogates
that are mentioned above. Take the ten least significant bits of that high
surrogate as a number and multiply that number by 1024. To that result, add
the ten least significant bits of the low surrogate, then add the ordinary
base 10 number 65536, which is, in binary, 1 followed by 16 zeros. This
means that the result will be in the range of U+F0000 through to U+10FFFF.

Section 3.14 of chapter 13 of the Unicode specification and section 3.7 of
chapter 3 of the Unicode specification have details of the method.

When I first met with the idea of surrogates they seemed very complicated
with the addition of 65536 seeming strange. However, I have since come to
the view that it is a very clever method of adding in sixteen extra planes
each of 65536 code points without having any of the code points which are
produced using surrogates being duplicates for code points that are already
in the basic 16 bit code plane. Thus Unicode has seventeen planes each of
65536 code points, by having the original plane of 65536 code points
together with an additional sixteen planes each of 65536 code points.
However, two code points for each plane are unused, namely those ending
hexadecimal FFFE and hexadecimal FFFF.

William Overington

13 May 2002



This archive was generated by hypermail 2.1.2 : Mon May 13 2002 - 07:11:33 EDT