[Corpora-List] Word frequencies in English, French, German, Spanish, Dutch, Italian and Portuguese

Mark Davies Mark_Davies at byu.edu
Mon Feb 12 20:42:46 UTC 2007


For Spanish, you might consult the Routledge Frequency Dictionary of Spanish, which came out in early 2006. It contains the top 5000 lemmas in Spanish, and is based on 20 million words from the late 1900s -- 1/3 spoken, 1/3 fiction, 1/3 non-fiction -- in the Corpus del Español (http://www.corpusdelespanol.org).

>> You can get word frequencies lists for the Portuguese language in 
>> Linguateca (http://www.linguateca.pt/), for instance, here:
>>      http://acdc.linguateca.pt/acesso/tokens/tokens.todos  (token list)
>>      http://acdc.linguateca.pt/acesso/tokens/lemas.todos   (lemma list)

For Portuguese, you might also consult the Corpus do Português:

     http://www.corpusdoportugues.org

You can get the top x word forms overall, by register, between registers, etc. The corpus has 45 million words; 20 million from the 1900s -- 2m spoken, 6m fiction, 6m newspaper, and 6m academic; 1/2 Portugal, 1/2 Brazil. In late 2007, Routledge will publish a frequency dictionary based on this data, similar to the Spanish one noted above.

Best,

Mark Davies

============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================



More information about the Corpora mailing list