From: jon@hackcraft.net
Date: Thu Jan 08 2004 - 07:09:01 EST
> I writing a small tool to get text from a txt file into a edit box.
> Now this txt file could be in any encoding for eg(UTF-8,UTF-16,Mac
> Roman,Windows ANSI,Western (ISO-8859-1),JIS,Shift-JIS etc)
> My problem is that I can distinguish between UTF-8 or UTF-16 using the BOM.
> But how do I auto detect the others.
> Any kind of help will be appreciated.
There is no foolproof way of differentiating between some of the encodings.
While UTF-16 or UTF-8 with a BOM (such files don't necessarily start with a BOM
by the way) "stand out" as being unlikely to be in any other encoding others
are more troublesome.
If there is no source of encoding information (such as you get with xml
declarations, HTTP headers and such), and even if there is, it may be best to
offer your users the ability to select encodings (perhaps with the default
choice based on locale settings).
-- Jon Hanna <http://www.hackcraft.net/> *Thought provoking quote goes here*
This archive was generated by hypermail 2.1.5 : Thu Jan 08 2004 - 08:58:48 EST