Re: FW: Subj: How to encode Hex10FFFF characters with UTF-16??

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Mar 10 2006 - 16:10:22 CST

  • Next message: Philippe Verdy: "Re: Subj: How to encode Hex10FFFF characters with UTF-16??"

    KKM asked:

    > I've got a problem to understand how it is possible to encode Hex10FFFF
    characters with UTF-16. If I try to calculate the range of UTF-16 I always get a
    maximum number of Hex10F7FF.
    >
    > Calculation:
    >
    > (DBFF - D7FF) * (DFFF - DBFF) + D7FF + FFFF - DFFF
    > (High Surr.) (Low Surr.) (0 to D7FF) (D800 to FFFF)
    >
    > Please tell me how to encode Hex10FFFF characters.

    You cannot encode 0x10FFFF characters, precisely because
    the surrogate code points are not available for encoding
    characters.

    There are 0x10FFFF *code points*.

    Some of those code points (surrogate code points, noncharacter
    code points) are unavailable for encoding characters. Others
    are assigned to usages (private use) that also prevent
    encoding characters there.

    For a better picture of what is and is not available, and how
    many characters *can* be encoded in the future, see Table D-4
    in the Unicode Standard:

    http://www.unicode.org/versions/Unicode4.0.0/appD.pdf

    As of Unicode 4.0, there were 878,083 code points reserved
    for future encoding.

    As of Unicode 5.0, are currently 875,441 code points reserved
    for future encoding.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Mar 10 2006 - 16:11:48 CST