From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Aug 11 2005 - 07:36:10 CDT
From: "Theo Veenker" <Theo.Veenker@let.uu.nl>
> Did you check this one, it is a java port of mozilla's automatic charset
> detection algorithm. The original C++ sources are provided as well.
>
> http://www.i18nfaq.com/chardet.html
Not a bad ressource, but it only addresses the autodetection of East-Asian
charsets. There's nothing to help detecting the autodetection of European
charsets (notably all those in ISO-8859-*, even if we exclude windows
charsets which are extensions of these ISO charsets).
Also missing is the detection of Vietnamese VISCII, and Russian/Ukrainian
charsets which are more common than ISO-8859 Cyrillic.
Add to this the need to detect legacy MacOS charsets and DOS/OEM codepages.
Is there some project in Mozilla to add support for them? This would require
adding more statistics accurate for common European languages (notably
French and Spanish, which are sometimes incorrectly detected as Asian
charsets).
But now if you consider the subject of this thread, there's absolutely
nothing there for the Arabic script...
This archive was generated by hypermail 2.1.5 : Thu Aug 11 2005 - 07:37:36 CDT