RE: Invalid unicode character, not found

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Fri Aug 24 2001 - 15:26:44 EDT


Benjamin,

The highest Unicode character is 0x0010FFFF and 0x001ACBBC is larger than
the top end of the Unicode range. I don't know how you got it in your data.
UTF-16 surrogates should not be able to generate a character that big. Bad
UTF-8 might if the decoder does not check ranges. For example if you had a
UTF-8 string of 0xF6, 0xAC, 0xAE, 0xBC (if my calculations are correct)

It could be that the converter is taking code page data as UTF-8 data.

Carl

> -----Original Message-----
> From: Israel, Benjamin [mailto:Benjamin.Israel@tfn.com]
> Sent: Friday, August 24, 2001 11:51 AM
> To: 'cbrown@xnetinc.com'
> Subject: FW: Invalid unicode character, not found
>
>
> Or rather, I just could not identify the unicode character 0x1acbbc in the
> xml request!!! So why did I get this exception?
>
> -----Original Message-----
> From: Israel, Benjamin
> Sent: Friday, August 24, 2001 2:49 PM
> To: 'Carl W. Brown'
> Subject: RE: Invalid unicode character, not found
>
>
> Carl,
>
> Thanks for your reply.
>
> But is there a unicode character 0x1acbbc in existence, if so what is it?
>
> In the xml request which was submitted by a user, I could see
> nothing wrong
> and the same request could be viewed from Internet Explorer 5.5 where as
> from the Apache webserver, when I am making use of Xerces parsers
> in my java
> code, the same xml request crashed with the given exception.
>
> I just could not find the unicode character 0x1acbbc in the xml request!!!
> So why did I get this exception?
>
> Thanks,
>
> Benjamin
>
>
>
>
>
> -----Original Message-----
> From: Carl W. Brown [mailto:cbrown@xnetinc.com]
> Sent: Friday, August 24, 2001 2:45 PM
> To: unicode@unicode.org
> Cc: Israel, Benjamin
> Subject: RE: Invalid unicode character, not found
>
>
> Benjamin,
>
> Since the highest possible valid Unicode character is 0x0010FFFF,
> I can see
> why you got an exception.
>
> Good luck,
>
> Carl
>
> > -----Original Message-----
> > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
> > Behalf Of Magda Danish (Unicode)
> > Sent: Friday, August 24, 2001 10:47 AM
> > To: unicode@unicode.org
> > Cc: Benjamin.Israel@tfn.com
> > Subject: FW: Invalid unicode character, not found
> >
> >
> > Does anyone on the list have an answer to this question?
> > Thanks.
> > Magda.
> >
> > -----Original Message-----
> > From: Israel, Benjamin [mailto:Benjamin.Israel@tfn.com]
> > Sent: Friday, August 24, 2001 8:25 AM
> > To: 'info@unicode.org'
> > Subject: Invalid unicode character, not found
> >
> >
> > Dear Whom it may concern,
> >
> > I would like to know the character equivalent of 0x1acbbc, since I
> > could not find it on the URL
> > http://www.unicode.org/charts/charindex.html since I have an exception
> > in my code as :
> >
> > org.xml.sax.SAXParseException: An invalid XML character (Unicode:
> > 0x1acbbc) was found in the element content of the document.
> > at
> > org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1196)
> > at
> > org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDo
> > cume
> > ntScanner.java:644)
> > at
> > org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatc
> > h(XM
> > LDocumentScanner.java:1360)
> > at
> > org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScan
> > ner.
> > java:381)
> > at
> > org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
> >
> > A reply in advance would really be of much help to me.
> >
> > Thanks & regards,
> >
> > Benjamin
> >



This archive was generated by hypermail 2.1.2 : Fri Aug 24 2001 - 16:35:31 EDT