Re: HTML Validation (was Re: Clean and Unicode compliance)

From: Elliotte Rusty Harold (elharo@metalab.unc.edu)
Date: Sun Dec 16 2001 - 07:17:54 EST


At 3:07 AM -0800 12/16/01, James Kass wrote:

>Tests run on non-BMP text show no problem for Plane One using
>UTF-8 encoding but error messages are generated when these
>characters are referenced as NCRs.
>

I suspect there's a lot of random mistakes like this waiting to be
discovered. I recently added a Plane-1 musical symbol to a book I'm
working on, and watched Xerces's XMLSerializer class trip over it. It
emitted the character as two character references, one for each half
of the surrogate pair, rather than one, thus producing malformed
HTML. It worked when I switched to UTF-8 encoding though.

I suspect a lot of our tools haven't been thoroughly tested with
PLane-1 and are likely to have these sorts of bugs in them.

-- 

+-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java news: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML news: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+



This archive was generated by hypermail 2.1.2 : Sun Dec 16 2001 - 09:55:12 EST