From: Ben Dougall (bend@freenet.co.uk)
Date: Tue May 13 2003 - 08:19:29 EDT
> All I need is one thing:
>
> What I actually look for is a way to check files about the encoding
> they are
> encoded in. Is there a SW that just tells me: This text is encoded in
> UTF8,
> ASCII, UCS2 or whatever?
i would also like to do the same thing, so if you find any more useful
info than what i point to here, i'd really appreciate it if you could
let me know about it :
have a look at the very recent thread on this list, in the archives:
"suggestions for strategy on dealing with plain text in potentially any
(unspecified) encoding?" there's a lot of useful stuff in that.
basically nearly all text encodings just go ahead and use their
encoding without stating "i'm 7bit ascii" or whatever, first. (even
unicode, when it doesn't use a bom). so, often the required info simply
isn't there. some html, most(maybe all) xml, some unicode(via a bom)
and most(maybe all) emails have information to which encoding is being
used.
so it seems if anything is going to tell you explicitly which encoding
is being used, it's going to be the text format rather than the
encoding itself (apart from unicode and it's boms). if the text or the
encoding itself does not specify the encoding, i don't think there is
any absolute, sure way to find out. but there are various methods to
make good, educated guesses (see the thread i mentioned).
also someone on this list pointed me to this which you might find
useful:
<http://www.mlmassociates.cc/dl-win32.htm>
Dcpcmd is a command line program that illustrates using the Windows
IMultiLanguage interface to detect a code page. Several sample text
files
are provided.
This archive was generated by hypermail 2.1.5 : Tue May 13 2003 - 09:28:30 EDT