Re: Detecting encoding in Plain text

From: Katsuhiko Momoi (momoi@alumni.indiana.edu)
Date: Fri Jan 09 2004 - 20:25:10 EST

  • Next message: Deepak Chand Rathore: "doubt"

    Peter Jacobi wrote:

    >Katsuhiko Momoi wrote:
    >
    >
    >>The specific URL for our IUC 19 paper with an update note at the
    >>beginning is this:
    >>
    >>http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
    >>
    >>
    >
    >from said paper:
    ><cite>
    >[UTF8] is inactive
    >[SJIS] is inactive
    >[EUCJP] detector has confidence 0.950000
    >[GB2312] detector has confidence 0.150852
    >[EUCKR] is inactive
    >[Big5] detector has confidence 0.129412
    >[EUCTW] is inactive
    >[Windows-1251 ] detector has confidence 0.010000
    >[KOI8-R] detector has confidence 0.010000
    >[ISO-8859-5] detector has confidence 0.010000
    >[x-mac-cyrillic] detector has confidence 0.010000
    >[IBM866] detector has confidence 0.010000
    >[IBM855] detector has confidence 0.010000
    ></cite>
    >
    >Is there any hidden preference in Mozilla to make this statistics
    >visible?
    >
    The first step is to use a debug build -- you need to build it from the
    source. This will allow debug output to a console. But as I recall,
    Shanjian disabled the output from Chardet at some point. Let me CC him
    so that he can tell you the rest.

    - Kat

    -- 
    Katsuhiko Momoi
    e-mail: katmomoi@pacbell.net
    


    This archive was generated by hypermail 2.1.5 : Fri Jan 09 2004 - 20:59:58 EST