From: Doug Ewell (dewell@adelphia.net)
Date: Thu Aug 11 2005 - 09:00:37 CDT
Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>> This is not a bad heuristic in general, but I don't think I'd suggest
>> using "a" as an indication that the text is in French. That word has
>> a tendency to occur in English now and then.
>
> I know, but it counts positively to French and English (probably more
> in English than in French were it is just a common conjugated form of
> an essential auxiliary verb). The idea is not to count single words,
> but to compute a summary statistic for lists of candidate languages,
> using list of words rated by occurence probability. Such a list of
> words will be much larger than the few examples I gave, and will
> include other common words and contractions.
That makes a lot more sense. Thank you for the clarification.
-- Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Thu Aug 11 2005 - 09:02:20 CDT