Re: Names for UTF-8 with and without BOM

From: Tex Texin (tex@i18nguy.com)
Date: Sat Nov 02 2002 - 19:16:11 EST

  • Next message: John Cowan: "Re: Names for UTF-8 with and without BOM"

    John,
    I understand the flexibility of XML to use different encodings.

    However, I didn't realize that parsers were to allow for the possibility
    of different signatures.
    So a parser has to worry about scsu signatures, etc....

    Whereas XML is so fussy about which characters it accepts, I am
    surprised at its flexibility for signatures.
    So when the parser gets JOECODE, I can understand ignoring the signature
    and autodetection, but exactly how does it find the first "<"?
    It must have to try all of the encodings known to it... ugh.

    tex

    John Cowan wrote:
    >
    > Tex Texin scripsit:
    >
    > > However, that leaves open the question whether only the Unicode
    > > transform signatures are acceptable or other signatures are also
    > > allowed. So if a vendor defines a code page, and defines a signature
    > > (perhaps mapping BOM/ZWNSP specifically to some code point or byte
    > > string) does that then become acceptable?
    >
    > IMHO yes. XML documents are not *required* to be in one of the character
    > sets that can be automatically detected by the methods of Appendix F.
    > You can encode your documents in (hypothetical) JOECODE, which uses leading
    > 00 as a signature (ignored by the XML parser) and then A=01, B=02, C=03, and so on.
    > Autodetection will not work here, but it is perfectly conformant to have
    > a processor that understands only UTF-8, UTF-16, and JOECODE.
    >
    > Of course some encodings, such as US-BSCII, which looks just like US-ASCII
    > except that A=0x42, B=0x41, a=0x62, b=0x61 will cause problems for anybody.
    > :-)
    >
    > I am a member of, but not speaking for, the XML Core WG.
    >
    > --
    > John Cowan jcowan@reutershealth.com www.ccil.org/~cowan www.reutershealth.com
    > "The competent programmer is fully aware of the strictly limited size of his own
    > skull; therefore he approaches the programming task in full humility, and among
    > other things he avoids clever tricks like the plague." --Edsger Dijkstra

    -- 
    -------------------------------------------------------------
    Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
    Xen Master                          http://www.i18nGuy.com
                             
    XenCraft		            http://www.XenCraft.com
    Making e-Business Work Around the World
    -------------------------------------------------------------
    


    This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 19:50:33 EST