[Corpora-List] Announcing the French WaCky corpus: frWaC

Adriano Ferraresi adriano at sslmit.unibo.it
Thu Apr 8 08:48:53 UTC 2010


Dear corpora members,

we are happy to announce that we've recently completed work on frWaC, a new corpus resource for French. 

Like deWaC (for German), itWaC (for Italian) and ukWaC (for English), frWaC is a mega-corpus (~ 1.6 billion words) obtained by crawling and post-proccesing Web data. It is available both in a plain text version, and in an annotated version, which includes Part-of-Speech and lemma information. An earlier version of the corpus, and the procedure for its construction, are described here:

Ferraresi, A., S. Bernardini, G. Picci and M. Baroni (2010) “Web Corpora for Bilingual Lexicography: A Pilot Study of English/French Collocation Extraction and Translation”. In Xiao, R. (ed.) Using Corpora in Contrastive and Translation Studies. Newcastle: Cambridge Scholars Publishing.

For more details on the corpus and how to obtain it, please visit the WaCky project website:

http://wacky.sslmit.unibo.it/ 

Best,

The WaCkies 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100408/90d2425d/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list