Re: Corrigendum #1 (UTF-8 shortest form) wording: MIME, and software interfaces specifications

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Nov 07 2003 - 19:32:59 EST

  • Next message: Kent Karlsson: "RE: Corrigendum #1 (UTF-8 shortest form) wording: MIME, and software interfaces specifications"

    From: "Doug Ewell" <dewell@adelphia.net>

    > Philippe Verdy wrote (in rich text):
    >
    > > Due to that, an application needs to specify whever it will support
    > > and comply with the full ISO/IEC 10646-1:2000 character set or to the
    > > Unicode subset.
    >
    > ISO/IEC 10646 has reduced its range to match Unicode's, so this
    > distinction is obsolete.

    It is not obsolete: the corrigendum #1 for UTF-8 (published in Unicode 4.0)
    refers to ISO/IEC 10646-1:2000, not to ISO/IEC 10646:2003 which is the
    character repertoire which corresponds to Unicode 4.0...

    So that's a reference error in the version of the now normative corrigendum
    published in Unicode 4.0...

    Does it need another Corrigendum to correct this reference in the
    Corrigendum?

    Well, I still doubt that ISO/IEC 10646 has reduced its character set. It has
    just agreed to limit its repertoire of _standardized_ and _interchangeable_
    characters to the first 17 planes so that _these_ characters can remain in
    sync and encoded identically in the Unicode repertoire with the same code
    points, but all the other planes are still present in ISO/IEC 10646, some of
    them being still allocated to PUAs that don't have equivalents in Unicode,
    but they are still valid within UTF-8 encoded data and also still conforming
    to ISO/IEC 10646 (even if they are illegal for use in Unicode 4.0, these
    sequences are not ill-formed like non shortest forms now forbidden in both
    standards).



    This archive was generated by hypermail 2.1.5 : Fri Nov 07 2003 - 20:09:46 EST