From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Mon Jan 19 2004 - 14:06:07 EST
N. Ganesh Babu wrote:
> I having XML file in Unicode-Big Indian font created in MS Word. Please
I believe you mean that you have chosen to save a document in the "Unicode Big Endian" encoding
scheme, formally known as UTF-16BE. An encoding is different from a font.
> let me know whether we can parse the XML file as it is with the MS Word?
> If yes please let me know the parser name.
Every XML parser that conforms to XML 1.0 must be able to handle UTF-8 and UTF-16. The latter is
best supported if it includes a Byte Order Mark in the document. I believe that Word includes the
BOM when you save as "Unicode" or "Unicode Big-Endian".
Java 1.4 contains an XML parser.
The Apache project provides the Xerces parser.
There are many others.
Spelling tip: big-endian, not "indian". From "end".
See http://www.unicode.org/faq/utf_bom.html
Encoding etc.:
http://oss.software.ibm.com/icu/docs/papers/forms_of_unicode/
http://www.unicode.org/reports/tr17/
I hope this helps,
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Mon Jan 19 2004 - 14:45:03 EST