[Corpora-List] announcing pukwac and wackypedia

Marco Baroni marco.baroni at unitn.it
Sat Dec 19 13:43:52 UTC 2009


Dear All,

We are happy to announce that you can download two new resources from 
the site of WaCky (Web as Corpus kool ynitiative):

http://wacky.sslmit.unibo.it/

1) pukWaC: the ukWaC corpus, a 2 billion Web-derived corpus of English, 
now enriched with a full dependency parse (POS-tagging and lemmatization 
done with the TreeTagger, parsing done with the MaltParser);

2) WaCkypedia: a full 2009 English Wikipedia dump (about 800 million 
tokens), POS-tagged, lemmatized and dependency parsed with the same 
tools used for pukWaC.

Please visit the site for details.

Enjoy!

The WaCkies:

- Alessandro Lenci (University of Pisa)

- Silvia Bernardini, Adriano Ferraresi, Eros Zanchetta (University of 
Bologna)

- Marco Baroni (University of Trento)



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list