[Corpora-List] Announcing the French WaCky corpus: frWaC
Adriano Ferraresi
adriano at sslmit.unibo.it
Thu Apr 8 08:48:53 UTC 2010
Dear corpora members,
we are happy to announce that we've recently completed work on frWaC, a new corpus resource for French.
Like deWaC (for German), itWaC (for Italian) and ukWaC (for English), frWaC is a mega-corpus (~ 1.6 billion words) obtained by crawling and post-proccesing Web data. It is available both in a plain text version, and in an annotated version, which includes Part-of-Speech and lemma information. An earlier version of the corpus, and the procedure for its construction, are described here:
Ferraresi, A., S. Bernardini, G. Picci and M. Baroni (2010) “Web Corpora for Bilingual Lexicography: A Pilot Study of English/French Collocation Extraction and Translation”. In Xiao, R. (ed.) Using Corpora in Contrastive and Translation Studies. Newcastle: Cambridge Scholars Publishing.
For more details on the corpus and how to obtain it, please visit the WaCky project website:
http://wacky.sslmit.unibo.it/
Best,
The WaCkies
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100408/90d2425d/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list