Re: Names for UTF-8 with and without BOM

From: John Cowan (jcowan@reutershealth.com)
Date: Sat Nov 02 2002 - 19:01:53 EST

Next message: Tex Texin: "Re: Names for UTF-8 with and without BOM"

Previous message: David Starner: "Re: Header Reply-To"
In reply to: Tex Texin: "Re: Names for UTF-8 with and without BOM"
Next in thread: Tex Texin: "Re: Names for UTF-8 with and without BOM"
Reply: Tex Texin: "Re: Names for UTF-8 with and without BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Tex Texin scripsit:

> However, that leaves open the question whether only the Unicode
> transform signatures are acceptable or other signatures are also
> allowed. So if a vendor defines a code page, and defines a signature
> (perhaps mapping BOM/ZWNSP specifically to some code point or byte
> string) does that then become acceptable?

IMHO yes. XML documents are not *required* to be in one of the character
sets that can be automatically detected by the methods of Appendix F.
You can encode your documents in (hypothetical) JOECODE, which uses leading
00 as a signature (ignored by the XML parser) and then A=01, B=02, C=03, and so on.
Autodetection will not work here, but it is perfectly conformant to have
a processor that understands only UTF-8, UTF-16, and JOECODE.

Of course some encodings, such as US-BSCII, which looks just like US-ASCII
except that A=0x42, B=0x41, a=0x62, b=0x61 will cause problems for anybody.
:-)

I am a member of, but not speaking for, the XML Core WG.

-- 
John Cowan  jcowan@reutershealth.com  www.ccil.org/~cowan  www.reutershealth.com
"The competent programmer is fully aware of the strictly limited size of his own
skull; therefore he approaches the programming task in full humility, and among
other things he avoids clever tricks like the plague."  --Edsger Dijkstra

Next message: Tex Texin: "Re: Names for UTF-8 with and without BOM"
Previous message: David Starner: "Re: Header Reply-To"
In reply to: Tex Texin: "Re: Names for UTF-8 with and without BOM"
Next in thread: Tex Texin: "Re: Names for UTF-8 with and without BOM"
Reply: Tex Texin: "Re: Names for UTF-8 with and without BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 19:38:36 EST