[Corpora-List] General Italian wordlist
Adam Kilgarriff
adam at lexmasterclass.com
Thu Nov 15 07:54:24 UTC 2007
Wordlists for the following languages (or wordforms and of lemmas) from
corpora loaded into the Sketch Engine are available at
http://sketchengine.co.uk
Web corpora: English French German Italian Japanese Spanish
Non-web corpora: Chinese English Slovene Portuguese
Web corpora are, for general-language research purposes, usually better than
newspaper corpora -the usual alternative- as they cover a much wider range
of text types (see various studies by Serge Sharoff).
For Italian we have a 2b web corpus prepared by Marco Baroni, see
Baroni and Kilgarriff 2006 Large linguistically-processed Web corpora for
multiple languages
<http://kilgarriff.co.uk/Publications/2006-BaroniKilg-EACL-DeWAC.pdf> Proc.
EACL. Trento, Italy.
Adam Kilgarriff
-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Emiliano Guevara
Sent: 14 November 2007 23:08
To: CORPORA
Subject: Re: [Corpora-List] General Italian wordlist
Dear Jane,
unfortunately, there are still neither freely available, nor freely
manipulable "general" corpora in Italian comparable to the BNC (I
suppose what you mean is a reference corpus, balanced according to
genre, medium, large enough in size to be representative of the whole
language, etc).
I guess the best you can get is either wordlists generated from web
corpora or from large unbalanced corpora such as "La Repubblica
corpus" (check http://dev.sslmit.unibo.it/corpora/corpus.php?
path=&name=Repubblica).
The good news is: you can get all of this right at Bologna University!
I'll be happy to help you with any of these alternatives, and
eventually also to find a better way to do the keyword search beyond
what WSTools has to offer (when you start playing with several
million words, WSTools just chokes...).
Cheers,
Emiliano
On 14 Nov 2007, at 16:12, jane..johnson@@libero..it wrote:
> Similar to the BNC_World.lst for use with the Keyword tool of the
> WordSmith suite, I am looking for a wordlist generated from a
> general corpus of contemporary Italian to create a Keyword list
> for a selection of Italian novels. Can anyone point me in the right
> direction? thanks
> Jane Johnson
> University of Bologna
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
****************************************
Emiliano R. Guevara
Facoltà di Lingue e Lett. Straniere
Dip. di Lingue e Lett. Straniere
Università di Bologna
Via Cartoleria 5 (40124) Bologna, Italia
Homepage: http://morbo.lingue.unibo.it/
E-mail: emiliano.guevara at unibo.it
emiguevara at gmail.com
****************************************
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list