[Corpora-List] Announcing the French WaCky corpus: frWaC
Janne Bondi Johannessen
jannebj at iln.uio.no
Sat Apr 17 09:07:46 UTC 2010
Congratulations!
I would also like to mention that we have developed a Norwegian web
corpus: NoWaC
v 1.0.
The computational procedure used to collect the NoWaC corpus is largely
based on the techniques used to build the corpora published by the WaCky
initiative <http://wacky.sslmit.unibo.it/>. The NoWaC corpus was developed
by Emiliano Guevara.
Search the corpus: http://www.tekstlab.uio.no/nowac/
Read about it here: http://www.hf.uio.no/tekstlab/nowac.html
It will be properly announced later.
Best,
Janne Bondi Johannessen.
2010/4/8 Adriano Ferraresi <adriano at sslmit.unibo.it>
> Dear corpora members,
>
> we are happy to announce that we've recently completed work on frWaC, a new
> corpus resource for French.
>
> Like deWaC (for German), itWaC (for Italian) and ukWaC (for English), frWaC
> is a mega-corpus (~ 1.6 billion words) obtained by crawling and
> post-proccesing Web data. It is available both in a plain text version, and
> in an annotated version, which includes Part-of-Speech and lemma
> information. An earlier version of the corpus, and the procedure for its
> construction, are described here:
>
> Ferraresi, A., S. Bernardini, G. Picci and M. Baroni (2010) “Web Corpora
> for Bilingual Lexicography: A Pilot Study of English/French Collocation
> Extraction and Translation”. In Xiao, R. (ed.) Using Corpora in Contrastive
> and Translation Studies. Newcastle: Cambridge Scholars Publishing.
>
> For more details on the corpus and how to obtain it, please visit the WaCky
> project website:
>
> http://wacky.sslmit.unibo.it/
>
> Best,
>
> The WaCkies
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
--
Janne Bondi Johannessen
Professor, The Text Laboratory, ILN, http://www.hf.uio.no/tekstlab/
President, NEALT, http://omilia.uio.no/nealt/
University of Oslo
P.O.Box 1102 Blindern, N-0317 Oslo, Norway
Tel: +47 22 85 68 14, mob.: +47 928 966 34
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100417/96742c46/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list