Re: XML Parser for Unicode Big Indian font MSWord document

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Mon Jan 19 2004 - 14:06:07 EST

Next message: Dean Snyder: "Re: Cuneiform Free Variation Selectors"

Previous message: Michael Everson: "Re: Cuneiform Free Variation Selectors"
In reply to: N. Ganesh Babu: "XML Parser for Unicode Big Indian font MSWord document"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

N. Ganesh Babu wrote:
> I having XML file in Unicode-Big Indian font created in MS Word. Please

I believe you mean that you have chosen to save a document in the "Unicode Big Endian" encoding
scheme, formally known as UTF-16BE. An encoding is different from a font.

> let me know whether we can parse the XML file as it is with the MS Word?
> If yes please let me know the parser name.

Every XML parser that conforms to XML 1.0 must be able to handle UTF-8 and UTF-16. The latter is
best supported if it includes a Byte Order Mark in the document. I believe that Word includes the
BOM when you save as "Unicode" or "Unicode Big-Endian".

Java 1.4 contains an XML parser.
The Apache project provides the Xerces parser.
There are many others.

Spelling tip: big-endian, not "indian". From "end".
See http://www.unicode.org/faq/utf_bom.html

Encoding etc.:
http://oss.software.ibm.com/icu/docs/papers/forms_of_unicode/
http://www.unicode.org/reports/tr17/

I hope this helps,
markus

-- 
Opinions expressed here may not reflect my company's positions unless otherwise noted.

Next message: Dean Snyder: "Re: Cuneiform Free Variation Selectors"
Previous message: Michael Everson: "Re: Cuneiform Free Variation Selectors"
In reply to: N. Ganesh Babu: "XML Parser for Unicode Big Indian font MSWord document"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 19 2004 - 14:45:03 EST