From: Ken Krugler (ken@transpac.com)
Date: Wed Aug 24 2005 - 13:25:50 CDT
>>Kevin Burton has created an open source language detector written
>>in Java (see http://www.feedblog.org/2005/08/ngram_language_.html)
>>and he's asking for contributions of sample data for additional languages.
>
>Beside his blog page, and the existing sourceforge project name, he
>has not provided anything for now (there's no source and no demo
>available, not even a alpha version).
The code is available via CVS. You can view it at:
http://cvs.sourceforge.net/viewcvs.py/ngramcat/ngramcat/
>I wonder if it's a good idea to provide him with such data, if he
>does not want to publish anything in fact (there may be legal issues
>with his source, notably if he used copyrighted materials such as
>the paper he is citing).
Leaving aside any legal speculations, my query was for references to
_open_ sources of text ("...public data sets...").
-- Ken
-- Ken Krugler TransPac Software, Inc. <http://www.transpac.com> +1 530-470-9200
This archive was generated by hypermail 2.1.5 : Wed Aug 24 2005 - 13:27:04 CDT