Re: Handling of Surrogates

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Apr 17 2009 - 13:14:33 CDT

  • Next message: Asmus Freytag: "Re: more dingbats in plain text"

    On 4/17/2009 9:14 AM, John W Kennedy wrote:
    >
    > On Apr 17, 2009, at 7:32 AM, Sam Mason wrote:
    >
    >> On Thu, Apr 16, 2009 at 01:04:30PM -0700, Asmus Freytag wrote:
    >>> What should definitely result in an error is to write '\U0000D800'
    >>> because the 8-byte form is to be understood as UTF-32, and in that
    >>> context there would be an issue.
    >>
    >> That strikes me at too pedantic; if we did that should we also reject
    >> the number one when spelled as '00000000001'?
    >
    > Quite a few programming languages will reject '00000000008'.
    >
    Note that my example (D800) was an unpaired surrogate.
    These are not legal UTF-32, hence "there would be an issue".

    I should be more precise in one aspect: This usage should
    result in an error, if the string into which it is inserted is
    supposed to be valid when mapped to UTF-32.

    If not, then the only issue is whether redundant leading 0
    are a problem. They are not, as long as the U and not the
    u prefix are used.

    A./



    This archive was generated by hypermail 2.1.5 : Fri Apr 17 2009 - 13:17:01 CDT