Re: Regulating PUA.

From: Adam Twardoch (list.adam@twardoch.com)
Date: Sun Jan 21 2007 - 20:35:02 CST

  • Next message: Doug Ewell: "Re: Proposing UTF-21/24"

    vunzndi@vfemail.net wrote:
    > Are you saying utf16 doesn't support plane 16? This would make utf16
    > only part of unicode.
    No John, that's not the point. The point is that while a 32-bit encoding
    space can in theory hold 0xFFFFFFFF codes, codes higher than 0x0010FFFF
    are not valid Unicode codepoints. Mike uses such codes for internal
    purposes: they're invalid Unicode codepoints but could still be used as
    "non-codepoints". While Mike's software seems to filter out these
    non-codepoints when storing actual text, it can be noted that in both
    UTF-8 and UTF-32, it would be possible to actually store these
    non-codepoints. However, UTF-16 (which uses surrogates), does not give
    you opportunity to store them at all.

    Adam

    >
    > John Knightley (Linux , utf8 user)
    >
    > Quoting Richard Wordingham <richard.wordingham@ntlworld.com>:
    >
    >> Mike wrote on Sunday, January 21, 2007 6:56 PM
    >>
    >>> When I implemented collation, I needed to define code points for
    >>> the various contractions that can occur. To avoid clashing with
    >>> any private use code points, I chose to start allocating the con-
    >>> tractions at 0x110000. This has worked quite nicely.
    >>
    >> One problem with that solution is that it may work if you're working
    >> with extensions of UTF-8 or extensions of UTF-32, but just doesn't work
    >> with UTF-16. The other is that with the other two, especially
    >> extending UTF-8, you are quite likely to fall foul of defensive code
    >> guarding against impossible codepoints. It's a shame, for I had been
    >> about to suggest it.
    >>
    >
    >>
    >> Richard.
    >
    >
    >
    > -------------------------------------------------
    > This message sent through Virus Free Email
    > http://www.vfemail.net
    >
    >
    >
    >

    -- 
    Adam Twardoch
    | Language Typography Unicode Fonts OpenType
    | twardoch.com | silesian.com | fontlab.net
    


    This archive was generated by hypermail 2.1.5 : Sun Jan 21 2007 - 20:36:51 CST