From: Tex Texin (tex@i18nguy.com)
Date: Sat Nov 02 2002 - 19:03:53 EST
Hi John,
I meant the character "<".
As for notepad, what I should have either stated more completely or bit
my tongue, is that where there is a standard in place (and where it is
unambiguous) the mistakes of particular products shouldn't hold sway,
unless they are tantamount to a de facto standard. I (personally) don't
hold notepad in that class. In particular with respect to Michka's
comment that parsers should upgrade to accommodate notepad's BOM, I
rather thought notepad should be changed. But I certainly don't want to
get into a debate on notepad's influence on the market, so let's pretend
I bit my tongue in the last mail, and once again in this mail. ;-)
tex
John Cowan wrote:
>
> Tex Texin scripsit:
>
> > I didn't think the XML standard allowed for utf-8 files to have a BOM.
>
> This capability was never actually excluded, and was added by erratum
> (and force-majeure, when it became clear that BOMful UTF-8 was going to
> start becoming common). XML files are intended to be plain text, and
> if a large source of plain text insists on a BOM, so be it.
>
> > The standard is quite clear about requiring 0xFEFF for utf-16.
> > I would have thought a proper parser would reject a non-utf-16 file
> > beginning with something other than "<".
>
> If by "<" you mean the *character* "<", then yes. If you mean the *byte*
> 0x3C, then no: well-formed XML files can begin with any of 0x00 (UTF-32),
> 0x3C (ASCII-compatible), 0x4C (EBCDIC), 0xEF (UTF-8 with BOM), 0xFE (UTF-16
> in BE order), or 0xFF (UTF-16 in LE order). In principle they could begin with
> some other byte: 0x2B in UTF-7, e.g.
>
> > (The fact that notepad puts it there should be irrelevant.)
>
> Actual practice is never quite irrelevant.
>
> --
> John Cowan jcowan@reutershealth.com http://www.reutershealth.com
> "Mr. Lane, if you ever wish anything that I can do, all you will have
> to do will be to send me a telegram asking and it will be done."
> "Mr. Hearst, if you ever get a telegram from me asking you to do
> anything, you can put the telegram down as a forgery."
-- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 19:48:47 EST