From: jarkko.hietaniemi@nokia.com
Date: Thu Aug 25 2005 - 03:33:23 CDT
> I tried his demo page just with French, and the conclusion
> are not good.
> - starting by "essai", it replied finnish
> - extending it to "un essai", it replied romanian
> - extending it to "un essai long", "un essai plus long", or
> "un essai encore
> plus long", it replied "rumantsh"
> - extending it to "ceci est un essai long", "ceci est un
> essai trop long",
> "ceci est un essai encore trop long", "ceci est un essai
> suffisant", it
> replied again romanian...
I think you are being much too harsh in your judgment, it would do well to sit
down and think for a moment what does it do, based on what input, and what does
it output. Instead, you could have some fun, and see what it does.
a irish
au welsh
auk malay
auke german
aukea basque
aukeam malay
aukeama swahili
aukeamaa sanskrit
aukeamaan finnish
(The 'aukeamaan' being a valid Finnish word.) My main point being, I guess, that take
a look at the replies: 'a' is a valid word in MANY languages - but it replies only with
one. Ditto for 'au' and 'auk', and 'auke'. 'aukea', 'aukeama', and 'aukeamaa' are valid
Finnish words, but apparently they could be Basque, Malay, and Swahili.
I believe a relatively simple exercise in statistics, playing with the typical n-gram frequencies,
shows that you need to have dozens of letters to get any reasonably reliable results.
>
This archive was generated by hypermail 2.1.5 : Thu Aug 25 2005 - 03:36:23 CDT