Re: Detecting encoding in Plain text

From: jon@hackcraft.net
Date: Thu Jan 08 2004 - 07:09:01 EST

Next message: D. Starner: "Re: Detecting encoding in Plain text"

Previous message: John Delacour: "Re: Detecting encoding in Plain text"
In reply to: Brijesh Sharma: "Detecting encoding in Plain text"
Next in thread: John Delacour: "Re: Detecting encoding in Plain text"
Reply: John Delacour: "Re: Detecting encoding in Plain text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> I writing a small tool to get text from a txt file into a edit box.
> Now this txt file could be in any encoding for eg(UTF-8,UTF-16,Mac
> Roman,Windows ANSI,Western (ISO-8859-1),JIS,Shift-JIS etc)
> My problem is that I can distinguish between UTF-8 or UTF-16 using the BOM.
> But how do I auto detect the others.
> Any kind of help will be appreciated.

There is no foolproof way of differentiating between some of the encodings.
While UTF-16 or UTF-8 with a BOM (such files don't necessarily start with a BOM
by the way) "stand out" as being unlikely to be in any other encoding others
are more troublesome.

If there is no source of encoding information (such as you get with xml
declarations, HTTP headers and such), and even if there is, it may be best to
offer your users the ability to select encodings (perhaps with the default
choice based on locale settings).

--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*

Next message: D. Starner: "Re: Detecting encoding in Plain text"
Previous message: John Delacour: "Re: Detecting encoding in Plain text"
In reply to: Brijesh Sharma: "Detecting encoding in Plain text"
Next in thread: John Delacour: "Re: Detecting encoding in Plain text"
Reply: John Delacour: "Re: Detecting encoding in Plain text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 08 2004 - 08:58:48 EST