Re: RE: Roundtripping in Unicode

From: John Cowan (jcowan@reutershealth.com)
Date: Tue Dec 14 2004 - 07:54:53 CST

  • Next message: Lars Kristan: "RE: Roundtripping in Unicode"

    Doug Ewell scripsit:

    > "When faced with [an] ill-formed code unit sequence while transforming
    > or interpreting text, a conformant process must treat the first code
    > unit... as an illegally terminated code unit sequence -- for example, by
    > signaling an error, filtering the code unit out, or representing the
    > code unit with a marker such as U+FFFD REPLACEMENT CHARACTER."

    Plan 9, the original all-UTF-8 environment (it was translated
    in a single day from Latin-1 to UTF-8), represents ill-formed code unit
    sequences with the otherwise useless U+0080, on the grounds that an
    ill-formed code is semantically different from an untranslatable
    character, which is the purpose of U+FFFD.

    -- 
    LEAR: Dost thou call me fool, boy?      John Cowan
    FOOL: All thy other titles              http://www.ccil.org/~cowan
                 thou hast given away:      jcowan@reutershealth.com
          That thou wast born with.         http://www.reutershealth.com
    


    This archive was generated by hypermail 2.1.5 : Tue Dec 14 2004 - 07:59:25 CST