From: Petite Abeille (petite.abeille@gmail.com)
Date: Wed May 28 2008 - 16:52:30 CDT
On May 28, 2008, at 10:49 PM, Peter Johansson wrote:
> Is the Unicode-encoded character string self-descriptive?
No.
> That is, do I need a priori knowledge that it is encoded as, for
> example, UTF-8 rather than UTF-32?
Yes.
> Or, by examining the first byte (or first few bytes) can I determine
> the encoding?
Not really, but...
"Encoding Detector"
http://chardet.feedparser.org/docs/faq.html
"A composite approach to language/encoding detection"
http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
-- PA. http://alt.textdrive.com/nanoki/
This archive was generated by hypermail 2.1.5 : Wed May 28 2008 - 16:55:45 CDT