From: Chris Pratley (chrispr@Exchange.Microsoft.com)
Date: Thu Jan 08 2004 - 16:45:47 EST
If you are on the Windows platform, look at mlang.dll, and at the
IMultiLanguage2 and IMultiLanguage3 APIs, which provide this service. As
others have noted you will get false detections with too little or
ambiguous data, but you may be quite surprised at just how accurate this
detection is (sometimes just one character outside of the "ASCII"
repertoire), since there is language frequency data used as well as
merely encoding rules.
Chris
-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
Behalf Of Brijesh Sharma
Sent: January 8, 2004 3:08 AM
To: Unicode Mailing List
Subject: Detecting encoding in Plain text
Hi All,
I am new to Unicode.
I writing a small tool to get text from a txt file into a edit box.
Now this txt file could be in any encoding for eg(UTF-8,UTF-16,Mac
Roman,Windows ANSI,Western (ISO-8859-1),JIS,Shift-JIS etc)
My problem is that I can distinguish between UTF-8 or UTF-16 using the
BOM.
But how do I auto detect the others.
Any kind of help will be appreciated.
Regards
Brijesh Sharma
"You're not obligated to win. You're obligated to keep trying to do the
best
you can every day."
This archive was generated by hypermail 2.1.5 : Thu Jan 08 2004 - 17:27:52 EST