[Corpora-List] Word frequencies in English, French, German, Spanish, Dutch, Italian and Portuguese

Marco Baroni marco.baroni at unitn.it
Mon Feb 12 18:50:30 UTC 2007


You can also extract various types of frequency lists from the Italian la 
Repubblica corpus from here:

http://sslmitdev-online.sslmit.unibo.it/corpora/frequency.php?path=&name=Repubblica

They are not balanced like the CoLFIS list, but they come from a much 
larger corpus (about 400M tokens).

Regards,

Marco


Isabella Chiari wrote:
> For Italian the largest (sigh...) word frequency list available is the list
> from Corpus e Lessico di Frequenza dell'Italiano Scritto (CoLFIS) from a
> corpus of 3.150.075 token of written language.
> You can freely download the lists in various format at:
> http://www.istc.cnr.it/material/database/colfis/index_eng.shtml 
> The corpus is partially available for search at:
> http://www.ge.ilc.cnr.it/page.php?ID=archCoLFIS&lingua=it
> 
> Ref. Laudanna, A., Thornton, A.M., Brown, G., Burani, C. e Marconi, L.
> (1995). Un corpus dell'italiano scritto contemporaneo dalla parte del
> ricevente. In S. Bolasco, L. Lebart e A. Salem (a cura di), III Giornate
> internazionali di Analisi Statistica dei Dati Testuali. Volume I,
> pp.103-109. Roma: Cisu
> 
> Best wishes,
> Isabella Chiari
> 
> 
> Isabella Chiari
> 
> Università La Sapienza di Roma
> Dipartimento di Studi Filologici, Linguistici e Letterari (DSFLL)
> dell’Università di Roma “La Sapienza” 
> P.le Aldo Moro, 5, III Piano, Edificio ex Facoltà di Lettere e Filosofia,
> 00185 Roma, tel. +30 06 4991 3575
> e-mail: isabella.chiari at uniroma1.it
> Home page Alphabit www.alphabit.net
> Alphabit blog / Glottophilia blog
> 
> 
> 
> -----Original Message-----
> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
> Behalf Of Yorick Wilks
> Sent: lunedì 12 febbraio 2007 17.37
> To: corpora at lists.uib.no
> Subject: [Corpora-List] Word frequencies in English, French, German,
> Spanish, Dutch, Italian and Portuguese
> 
> Does anyone know easily accessible sources of these?
> Yorick Wilks
> Sheffield
> 
> 

-- 
Marco Baroni
CIMeC, University of Trento
http://www.form.unitn.it/~baroni



More information about the Corpora mailing list