pesonal comments about http://www.w3.org/TR/xml11/

From: Yung-Fong Tang (ftang@netscape.com)
Date: Mon Mar 10 2003 - 20:42:53 EST

  • Next message: Markus Scherer: "Re: Question about CollationTest_NON_IGNORABLE.txt & NormalizationTest.txt"

    Dear John Cowan:

    Here is my personal comments about
    http://www.w3.org/TR/xml11/
    "Extensible Markup Language (XML) 1.1
    W3C Candidate Recommendation 15 October 2002"

    1. in 2.2 Characters, you exclude characters U+FFFE and U+FFFF
    according to Unicode 3.1, the list of non characters are extended
    see

          Noncharacters

    There are 34 specific code points in Unicode 3.0 that are characterized
    as noncharacters. Unicode 3.1 adds an additional 32 noncharacters. To
    clarify the status of all 66, a definition (page 41) is added, and
    conformance rules C5 and C10 (pages 38, 39) are amended as follows:

        D7b Noncharacter: a code point that is permanently reserved for
        internal use, and that should never be interchanged. In Unicode 3.1,
        these consist of the values U+nFFFE and U+nFFFF (where n is from 0
        to 1016) and the values U+FDD0..U+FDEF.

            * For more information, see the discussions under "Special
              Noncharacter Values" in Section 2.7, Special Character and
              Noncharacter Values, and under "Noncharacters" in Section
              13.6, Specials.
            * These code points are permanently reserved as noncharacters.
              In the future, it is possible that additional code points may
              be specified to represent noncharacters.

        C5 A process shall not interpret either U+FFFE or U+FFFF a
        noncharacter code point as an abstract character.

            * The code points may be used internally, such as for sentinel
              values or delimiters, but should not be exchanged publicly.

        C10 A process shall make no change in a valid coded character
        representation other than the possible replacement of character
        sequences by their canonical-equivalent sequences or the deletion of
        noncharacter code points, if that process purports not to modify the
        interpretation of that coded character sequence.

            * If a noncharacter which does not have a specific internal use
              is unexpectedly encountered in processing, an implementation
              may signal an error or delete or ignore the noncharacter. If
              these options are not taken, the noncharacter should be
              treated as an unassigned code point. For example, an API that
              returned a character property value for a noncharacter would
              return the same value as the default value for an unassigned
              code point.

    in the http://www.unicode.org/reports/tr27/

    Therefore, should the following session changed from
    [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
    | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate
    blocks, FFFE, and FFFF. */

    to
    [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFDCF]
    | [#xFDF0-#xFFFD] | [#x10000-#x1FFFD] | [#x20000-#x2FFFD] |
    [#x30000-#x3FFFD] | [#x40000-#x4FFFD] | [#x50000-#x5FFFD] |
    [#x60000-#x6FFFD] | [#x70000-#x7FFFD] | [#x80000-#x8FFFD] |
    [#x90000-#x9FFFD] | [#xA0000-#xAFFFD] | [#xB0000-#xBFFFD] |
    [#xC0000-#xCFFFD] | [#xD0000-#xDFFFD] | [#xE0000-#xEFFFD] |
    [#xF0000-#xFFFFD] | [#x10000-#x1FFFD] /* any Unicode character,
    excluding the surrogate blocks, FDD0 to FDEF nFFFE, and nFFFF. */

    2. similar thing should apply to
    [4] NameStartChar
    #xFDD0-#xFDEF should not be allowed in NameStartChar
    nFFFE nor nFFFF should not be allowed in NameStartChar neither

    It looks the NameStartChar do not allow private use area
    [#xE000-#xF8FF]. If we follow that principal, then [#xF0000-#x10FFFF]
    should neither be in NameStartChar since
    http://www.unicode.org/Public/3.2-Update/Blocks-3.2.0.txt defined them
    as Supplementary Private Use Area

    F0000..FFFFF; Supplementary Private Use Area-A
    100000..10FFFF; Supplementary Private Use Area-B

    Also, I doubt we should allow

    E0000..E007F; Tags

    to be used as NameStartChar

    Frank Yung-Fong Tang



    This archive was generated by hypermail 2.1.5 : Mon Mar 10 2003 - 21:26:18 EST