[Corpora-List] Three new resources (plus one) related to English word frequency and collocates

Mark Davies Mark_Davies at byu.edu
Thu Feb 18 15:41:17 UTC 2010


Three new resources (plus an upcoming one) related to English word frequency and collocates, which might be of interest to some of you. For more information, please see http://www.wordfrequency.info/


New this week:


1.       (PDF) eBook containing the top 20,000 lemmas of American English, in rank frequency order (based on the 400 million word Corpus of Contemporary American English<http://www.americancorpus.org/>, 1990-2009). Also contains 20-30 collocates for each word + synonyms, indications of genre variation, etc.
[See a sample<http://www.wordfrequency.info/files/entries.pdf>]



2.       The book Frequency Dictionary of American English: word sketches, collocates, and thematic lists (Davies and Gardner, 2010, Routledge), which is now available from Routledge<http://www.routledgelanguages.com/books/A-Frequency-Dictionary-of-Contemporary-American-English-isbn9780415490634>, Amazon<http://www.amazon.com/Frequency-Dictionary-Contemporary-American-English/dp/0415490634/ref=sr_1_1?ie=UTF8&s=books&qid=1264117499&sr=8-1>, etc. Part of the Routledge series of frequency dictionaries<http://www.routledge.com/books/series/routledge_frequency_dictionaries>. Top 5,000 lemmas of American English. 20-30 collocates + synonyms for each word + genre distribution, etc. Also contains more than thirty thematically-oriented, frequency-ranked lists, such as new words in American English, American-British differences, phrasal verbs, genre-based differences, thematic vocabulary, etc .
[See a sample<http://www.wordfrequency.info/files/book.pdf>]

3.       A free frequency-ranked list<http://www.wordfrequency.info/free> of the top 5,000 lemmas of American English, based on the Corpus of Contemporary American English.

Upcoming (March 17)


4.       Same as #1, but with the top 200-300 collocates per word. Unlike #1, can be edited, copied from, etc. (Note: the files are ready now, but cannot be distributed for a month yet).
[See a sample<http://www.wordfrequency.info/files/entriesWithCollocates.zip>; 32 MB]

In addition, lemmas/POS/frequency-only 20,000 word frequency lists<http://www.wordfrequency.info/files/entriesWithoutCollocates.zip> (i.e. no collocates), as well as full bigram / trigram files from the Corpus of Contemporary American English will also be available.

Three examples (from among 20,000 in the eBook) for #1 above are the following (see a much larger sample<http://www.wordfrequency.info/files/entries.pdf>):

1421 blow v
noun  wind, whistle, air, nose, smoke, breeze, face, hair, kiss, head, window, horn, candle, mind, storm  misc  away, through, across
   out candle, window, breath, air, wind, smoke, knee, tire, match up building, plot, bomb, plane, car, bridge, wind, threaten off steam, head, roof, leg
● whoosh, gust, waft, puff || move, propel, drive, carry
27254 | 0.94 F

10129 shimmering j
noun light, water, heat, hair, sun, sea, surface, silver, glass, wave, color misc blue, white, across, above, green, golden, wear, red, dark, rise, yellow, beyond
● iridescent, sparkling, shining, gleaming, glistening, glittering
1555 | 0.90 F

18669 pathos n
adj full, human, genuine, pure, sympathetic, comic, final, deep, Greek, tragic noun humor, tragedy, comedy, sense, appeal, suffering, emotion, ethos, scene verb evoke, reflect, avoid, generalize, capture, experience, arouse
● sadness, bleakness, despair, tragedy, anguish
473 | 0.90 A



============================================

Mark Davies

Professor of (Corpus) Linguistics

Brigham Young University

(phone) 801-422-9168 / (fax) 801-422-0906



http://davies-linguistics.byu.edu



** Corpus design and use // Linguistic databases **

** Historical linguistics // Language variation **

** English, Spanish, and Portuguese **

============================================





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100218/8b234782/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list