Re: Request for review: 3023bis (XML media types) makes significant changes

From: Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>
Date: Thu, 19 Dec 2013 03:05:44 +0100

Henry and Martin,

Martin J. Dürst, Wed, 18 Dec 2013 16:59:10 +0900, in reply to Henry S.
Thompson:

>> * In cases where conflicting information is supplied (from charset
>> param, BOM and/or XML encoding declaration) it give a BOM, if
>> present, authoritative status;
>
> I'm a bit uneasy about the fact that we now have BOM (internal) -
> charset (external) - encoding (internal), i.e.
> internal-external-internal,

A better way of looking at would be that we now get External-Internal.

Were external is subdivided in charset parameter and encoding signature
[BOM]. And internal is subdivided in encoding declaration and
default/fallback encoding. Yeah, it might be that a lack of clear
classification of the BOM as an external method is quite directly
linked the lacking interoperability.

Previously we had External-Limbo-Internal. However, per XML, both BOM
and charset param are external.[1] The draft makes a point about
this:[2] ”[XML] further states that the BOM is an encoding signature,
and is not part of either the markup or the character data of the XML
document.”

> but I guess there is lots of experience
> in HTML 5 for giving the BOM precedence.

Sorry for focusing on XML rather than XML media types, but I think both
of them should be edited.

The way of looking at it that I propose above also incorporates the
fact that XML-capable Web browsers (the HTML 5 browsers) give
precedence to the BOM, and without fatal error if there is a
(conflicting) XML encoding declaration. (Btw, I find it very odd that,
up until now, the *charset* parameter could override the encoding
declaration, but if the BOM does the same [that is: overrides the
encoding declaration], *then* it is a fatal error ...)

It makes sense to treat all external encoding declaration methods the
same. Currently only the external *transport* protocol may override the
internal mechanism. But the BOM should have the same ”right”.

Therefore I would suggest that the other spec, XML 1.0, section 4.3.3
[3] does this (see the <INS> element):

]]In the absence of information provided by an external transport
protocol (e.g. HTTP or MIME) <INS>OR BY THE BYTE ORDER MARK</INS>, it
is a fatal error for an entity including an encoding declaration to be
presented to the XML processor in an encoding other than that named in
the declaration,[[

It should still be an error, but not a fatal error, if the xml encoding
declaration conflicts with the external method - BOM or HTTP.

[1] http://www.w3.org/TR/REC-xml/#NT-document
[2]
http://tools.ietf.org/html/draft-ietf-appsawg-xml-mediatypes-06#section-3.3
[3] http://www.w3.org/TR/REC-xml/#charencoding

-- 
leif halvard silli
Received on Wed Dec 18 2013 - 20:10:06 CST

This archive was generated by hypermail 2.2.0 : Wed Dec 18 2013 - 20:10:14 CST