Multi-lingual corpus?

From: Ken Krugler (ken@transpac.com)
Date: Wed Aug 24 2005 - 11:51:33 CDT

Next message: Neelesh Bodas: "Re: Unicode TTF question"

Previous message: Philippe Verdy: "ISO 639-3 language variants for French"
Next in thread: Philippe Verdy: "Re: Multi-lingual corpus?"
Reply: Philippe Verdy: "Re: Multi-lingual corpus?"
Maybe reply: jarkko.hietaniemi@nokia.com: "RE: Multi-lingual corpus?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi all,

Kevin Burton has created an open source language detector written in
Java (see
<http://www.feedblog.org/2005/08/ngram_language_.html>http://www.feedblog.org/2005/08/ngram_language_.html)
and he's asking for contributions of sample data for additional
languages.

Any suggestions for a multi-lingual corpus that could be used as
training data? I believe he used some Wikipedia entries, but I'm
hoping there are larger and more complete public data sets available.

Thanks,

-- Ken

-- 
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200

Next message: Neelesh Bodas: "Re: Unicode TTF question"
Previous message: Philippe Verdy: "ISO 639-3 language variants for French"
Next in thread: Philippe Verdy: "Re: Multi-lingual corpus?"
Reply: Philippe Verdy: "Re: Multi-lingual corpus?"
Maybe reply: jarkko.hietaniemi@nokia.com: "RE: Multi-lingual corpus?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Aug 24 2005 - 12:02:48 CDT