Re: XML and Unicode interoperability comes before HTML or even SGML

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Aug 15 2004 - 15:44:00 CDT

  • Next message: Martin Duerst: "new mailing list: public-ietf-collation@w3.org"

    From: "Doug Ewell" <dewell@adelphia.net>
    > W3C still maintains a distinction between HTML and XHTML, and still
    > offers both specifications.on its site.

    And Unicode still publishes its previous versions too.
    And even the RFC Editor publishes deprecated RFCs on its web site too
    (www.rfc-editor.org is the official publication web site, even if many RFCs
    are still hosted on the ietf.org web site as the IETF was at the origin of
    most RFCs).

    > HTML is not deprecated.

    I did not say that. I just said that XHTML is the current recommandation by
    the W3C, and HTML 4.01 will remain documented even if it is later officially
    deprecated.

    This said, HTML 4.01 is still the most used specification in
    implementations, even if the current browsers will behave correctly with
    XHTML that offers a very good backward compatibility: this means that in
    practice, there's no reason why authors should continue to use HTML 4.01 for
    their documents.

    The only problem is for users of WYSIWIG HTML editors, that often do not
    comply with XHTML requirements. For example, only Frontpage in its 2003
    version allows generating XHTML conformant documents, but it does not do it
    by default: the designer must still use an explicit command to reformat its
    document with a XML conformant syntax, and there's still no check of the
    document to see if it will validate against a specific XHTML DTD or schema.

    -- For various reasons, authors still need to be allowed to generate legacy
    HTML elements like <center> or <applet> even if they are not part of the
    loosest XHTML schema, as legacy browsers still won't recognize blocks
    centered with <div align="center> elements or applet referenced by <object>
    (there are still disagreements between implementations about how external
    object types should be designated.)

    So in practice, XHTML 1.1 (with its strict but modular and extensible
    schema) is a design goal for the future (when standard modules will be
    developped and agreed between browser vendors), but XHTML 1.0 with its
    "loose" schema offers an excellent interoperability with the benefit of a
    full XML-conformance. And if authors don't care about conformance with a
    specific XHTML schema version, they can still use the legacy elements they
    want within a XML-conformant document, and label them with a "text/html"
    MIME type (they just need to not reference the XHTML 1.0 or 1.1 standard DTD
    in their DOCTYPE declaration, or they can reference their own DTD that
    allows validating their documents).

    The most important thing is not which precise schema they will use in their
    document, but the fact that they have prepared their documents so that thay
    can be accepted by standard and simpler XML parsers (HTML parsers are really
    huge, full of hacks when trying to mimic the interpretation bugs of legacy
    browsers, difficult to maintain, and contain too many bugs or
    interoperability problems). I also don't know any HTML 4.01 parser that
    effectively fully respects the HTML 4.01 specification, and I think that any
    implementation that would try to do that would not render many web sites
    designed either for Internet Explorer or Netscape 4 (and many websites still
    don't work correctly with Mozilla-based browsers, unless the websites uses
    many browser detection scripts and server-side dynamic code generation). --

    The main object of my message was to warn Unicode that the Technical Report
    about interoperability of XML and Unicode has not been reviewed since the
    recent changes in Unicode 4.0.1 with the inclusion of ZW(N)J within
    combining sequences. May be there's some work in progress at the W3C or in a
    technical commitee to make the necessary changes in this UTR, but for now
    the changes in clauses D14 and D17 create new unexpected interoperability
    problems with XML. Solving these problems for XML will help solve at the
    same time the problem in XHTML and HTML...



    This archive was generated by hypermail 2.1.5 : Sun Aug 15 2004 - 15:49:58 CDT