[Corpora-List] Frequency lists (corrected)

Mark Davies Mark_Davies at byu.edu
Mon Feb 23 17:41:42 UTC 2009

There are also frequency lists for American English (based on COCA -- a balanced corpus of nearly 400 million words), TIME Magazine (100m words, 1920s-2000s), Spanish (20m words, 1900s) and Portuguese (20m words, 1900s). Also available are n-grams for all of these languages (as well as for the BNC). See:


Also, later this year there will be a printed frequency dictionary from Routledge. It will include the top 5,000 lemmas in American English (from COCA), as well as the top 20-30 collocates of each of these lemma (grouped by PoS and function: subj/obj etc), as well as indications of genre-based variation, etc.

Mark Davies

Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
Corpora mailing list
Corpora at uib.no

More information about the Corpora mailing list