RFC, 5-6 octets sequence in UTF8, non short form in UTF8

From: Yung-Fong Tang (ftang@netscape.com)
Date: Tue Feb 18 2003 - 15:10:07 EST

  • Next message: Tex Texin: "Re: Hot Beverage font."

    I read the RFC 2279 again (
    http://www.cis.ohio-state.edu/cs/Services/rfc/rfc-text/rfc2279.txt )
    1. I cannot find any text in it mentioned about. non short form is
    invalid UTF8, and
    2. It mentioned about 1-6 octets of UTF8
    3. It mentioned about how to encode surrogate pair to UTF-8. But it does
    not say the UTF8 sequence mapping directly to Surrogate High and
    Surrogate Low are illegal

    I remember in last couple year the definitation of UTF-8 is changing
    from 1-6 bytes to 1-4 octets because the decision of the future roadmap
    of Unicode/ISO 10646.

    Here is my question;
    1. Is there an updated RFC obsoleted RFC 2279 ? (I cannot find it, if we
    have one, what is the number? and URL)
    2. Is there a formal speciification talk about non short form is illegal
    in UTF8 (the RFC2279 mentioned very lightly, but does not formal specify
    that is illegal. It only mentioned that are security concern) and
    directly encode Surrogate is illegal? or maybe the language in RFC2279
    is good enough.
    3. Is there a formal specification mentioned that UTF-8 is only 1-4
    octects and therefore update the part the RFC2279 mentioned 1-6 octects?

    Thanks



    This archive was generated by hypermail 2.1.5 : Tue Feb 18 2003 - 15:54:02 EST