[Corpora-List] Free text corpora?
Raphael Mudge
raffi at automattic.com
Tue Mar 2 21:31:29 UTC 2010
Hi Xin,
A collection of plain text files of public domain books is available
from Project Gutenberg:
http://www.gutenberg.org/wiki/Main_Page
You can also download Wikipedia and convert the data into plain text.
http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
If you need to mark-up the corpus with a POS tagger, Stanford's POS
tagger may work for you.
http://nlp.stanford.edu/software/tagger.shtml
-- Raphael
Raphael Mudge
Code Wrangler, Automattic
http://www.afterthedeadline.com
On Mar 2, 2010, at 6:38 AM, Xin Yan wrote:
> Hello,
>
> can anyone tell me, if there are some free text corpora for
> commercial purpose?
> Thank you in advance!
>
> Best,
> Xin Yan
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list