[Corpora-List] R: General Italian wordlist

Isabella Chiari isabella.chiari at uniroma1.it
Thu Nov 15 15:24:14 UTC 2007


For Italian there are some frequency word list available 
from Corpus e Lessico di Frequenza dell'Italiano Scritto (CoLFIS) from a
corpus of 3.150.075 token of written language.
You can freely download the lists in various formats (txt, xls, mdb...) at:
http://www.istc.cnr.it/material/database/colfis/index_eng.shtml
The corpus is partially available for search at:
http://www.ge.ilc.cnr.it/page.php?ID=archCoLFIS&lingua=it

Ref. Laudanna, A., Thornton, A.M., Brown, G., Burani, C. e Marconi, L.
(1995). Un corpus dell'italiano scritto contemporaneo dalla parte del
ricevente. In S. Bolasco, L. Lebart e A. Salem (a cura di), III Giornate
internazionali di Analisi Statistica dei Dati Testuali. Volume I,
pp.103-109. Roma: Cisu

It is not as large as the Repubblica corpus, but COLFIS is balanced and
contains not only newspapers texts, but also novels, esseys, magazines, etc.

Best wishes,
Isabella Chiari

-----Messaggio originale-----
Da: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] Per conto di
jane.johnson at libero.it
Inviato: mercoledì 14 novembre 2007 16.12
A: CORPORA
Oggetto: [Corpora-List] General Italian wordlist

Similar to the BNC_World.lst for use with the Keyword tool of the WordSmith
suite, I am looking for a wordlist generated from a general corpus of
contemporary Italian  to create a Keyword list for a selection of Italian
novels. Can anyone point me in the right direction? thanks 
Jane Johnson
University of Bologna


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list