[Corpora-List] announcing pukwac and wackypedia
Marco Baroni
marco.baroni at unitn.it
Sat Dec 19 13:43:52 UTC 2009
Dear All,
We are happy to announce that you can download two new resources from
the site of WaCky (Web as Corpus kool ynitiative):
http://wacky.sslmit.unibo.it/
1) pukWaC: the ukWaC corpus, a 2 billion Web-derived corpus of English,
now enriched with a full dependency parse (POS-tagging and lemmatization
done with the TreeTagger, parsing done with the MaltParser);
2) WaCkypedia: a full 2009 English Wikipedia dump (about 800 million
tokens), POS-tagged, lemmatized and dependency parsed with the same
tools used for pukWaC.
Please visit the site for details.
Enjoy!
The WaCkies:
- Alessandro Lenci (University of Pisa)
- Silvia Bernardini, Adriano Ferraresi, Eros Zanchetta (University of
Bologna)
- Marco Baroni (University of Trento)
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list