From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Mar 10 2006 - 16:10:22 CST
KKM asked:
> I've got a problem to understand how it is possible to encode Hex10FFFF 
characters with UTF-16. If I try to calculate the range of UTF-16 I always get a 
maximum number of Hex10F7FF.
> 
> Calculation:
> 
> (DBFF - D7FF) * (DFFF - DBFF) +   D7FF   +    FFFF - DFFF
> (High Surr.)    (Low Surr.)    (0 to D7FF)   (D800 to FFFF)
> 
> Please tell me how to encode Hex10FFFF characters.
You cannot encode 0x10FFFF characters, precisely because
the surrogate code points are not available for encoding
characters.
There are 0x10FFFF *code points*.
Some of those code points (surrogate code points, noncharacter
code points) are unavailable for encoding characters. Others
are assigned to usages (private use) that also prevent
encoding characters there.
For a better picture of what is and is not available, and how
many characters *can* be encoded in the future, see Table D-4
in the Unicode Standard:
http://www.unicode.org/versions/Unicode4.0.0/appD.pdf
As of Unicode 4.0, there were 878,083 code points reserved
for future encoding.
As of Unicode 5.0, are currently 875,441 code points reserved
for future encoding.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Mar 10 2006 - 16:11:48 CST