From: John Cowan (jcowan@reutershealth.com)
Date: Sat Nov 02 2002 - 18:46:39 EST
Tex Texin scripsit:
> I didn't think the XML standard allowed for utf-8 files to have a BOM.
This capability was never actually excluded, and was added by erratum
(and force-majeure, when it became clear that BOMful UTF-8 was going to
start becoming common). XML files are intended to be plain text, and
if a large source of plain text insists on a BOM, so be it.
> The standard is quite clear about requiring 0xFEFF for utf-16.
> I would have thought a proper parser would reject a non-utf-16 file
> beginning with something other than "<".
If by "<" you mean the *character* "<", then yes. If you mean the *byte*
0x3C, then no: well-formed XML files can begin with any of 0x00 (UTF-32),
0x3C (ASCII-compatible), 0x4C (EBCDIC), 0xEF (UTF-8 with BOM), 0xFE (UTF-16
in BE order), or 0xFF (UTF-16 in LE order). In principle they could begin with
some other byte: 0x2B in UTF-7, e.g.
> (The fact that notepad puts it there should be irrelevant.)
Actual practice is never quite irrelevant.
-- John Cowan jcowan@reutershealth.com http://www.reutershealth.com "Mr. Lane, if you ever wish anything that I can do, all you will have to do will be to send me a telegram asking and it will be done." "Mr. Hearst, if you ever get a telegram from me asking you to do anything, you can put the telegram down as a forgery."
This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 19:18:38 EST