Re: problems in Public Review 33 UTF Conversion Code Update

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed May 19 2004 - 20:19:44 CDT

  • Next message: Ernest Cline: "Re: problems in Public Review 33"

    /|/|ike (or |\|\ike) responded to Philippe:

    > > However I feel it's not legal (or really not recommanded) to encode non-
    > > character codepoints xFFFE-xFFFF where x is any plane number. So the rules
    > > need to be a bit more detailed to exclude them.
    >
    > Why do we need special rules to not encode characters that don't
    > exist?

    Please, everybody, before we start another pointless thread,
    examine the actual definition of UTF-8 and the rationale
    for an encoding scheme.

    UTF-8 must be able to represent every Unicode scalar value --
    and that *includes* noncharacter code points.

    D28 Unicode scalar value: Any Unicode code point except high-surrogate
        and low-surrogate code points.
        
    D29 A Unicode encoding form assigns each Unicode scalar value to a
        unique code unit sequence.
        
    Before you all start shooting from the hip about UTF-8 on the
    list, please read (and understand) the normative definitions of
    these things in the standard.

    --Ken

    P.S. Whoever (and whatever) is starting to prepend "[BULK]" to
    thread topics, would you cease and desist? ;-)



    This archive was generated by hypermail 2.1.5 : Wed May 19 2004 - 20:20:58 CDT