From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Mar 10 2006 - 16:10:22 CST
KKM asked:
> I've got a problem to understand how it is possible to encode Hex10FFFF
characters with UTF-16. If I try to calculate the range of UTF-16 I always get a
maximum number of Hex10F7FF.
>
> Calculation:
>
> (DBFF - D7FF) * (DFFF - DBFF) + D7FF + FFFF - DFFF
> (High Surr.) (Low Surr.) (0 to D7FF) (D800 to FFFF)
>
> Please tell me how to encode Hex10FFFF characters.
You cannot encode 0x10FFFF characters, precisely because
the surrogate code points are not available for encoding
characters.
There are 0x10FFFF *code points*.
Some of those code points (surrogate code points, noncharacter
code points) are unavailable for encoding characters. Others
are assigned to usages (private use) that also prevent
encoding characters there.
For a better picture of what is and is not available, and how
many characters *can* be encoded in the future, see Table D-4
in the Unicode Standard:
http://www.unicode.org/versions/Unicode4.0.0/appD.pdf
As of Unicode 4.0, there were 878,083 code points reserved
for future encoding.
As of Unicode 5.0, are currently 875,441 code points reserved
for future encoding.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Mar 10 2006 - 16:11:48 CST