[Corpora-List] COCA corpus: input and analyze entire texts

Mark Davies Mark_Davies at BYU.EDU
Mon Feb 13 14:26:46 UTC 2012


The following announcement is probably most relevant to those who use corpora for language teaching, and perhaps translators as well. It's written for a general audience (rather than the more sophisticated computational linguists found here on CORPORA), but it will hopefully still be of interest to some people here.



-------------------------------------





We've added a new feature at www.wordandphrase.info<http://www.wordandphrase.info> -- the alternative interface for COCA. You can now input an entire text -- maybe a newspaper article that you've copied from a website, or something you've written -- and it will then give you detailed information about the words and phrases in the text. There's now no need to copy and paste individual words and phrases into the regular COCA interface -- just work seamlessly from your original text.



First, it will highlight all of the medium and lower-frequency words in your text (based on frequency data from COCA), and create lists of these words that you can use offline. This frequency data can help language learners focus on new words, and it can allow you to see "what the text is about" (i.e. text-specific words). You can also have it show you the "academic" words in your text (again, based on COCA data).



Second, you can click on any word in your text to get detailed information about the word (all on one screen) -- its overall frequency in COCA, its frequency in each genre (spoken, fiction, magazine, newspaper, and academic), the 20-30 most frequent collocates (nearby words), up to 200 sample concordance lines, synonyms, and related words from WordNet. There's no need to go consult other dictionaries or thesauruses or online-resources -- it's all right there, with just one click for each and every word in your text.



Finally, you can also see detailed information about phrases in your text. Just click on a phrase in the text, and it will show you related phrases from COCA. For example, if you're writing a paper and have used the phrase potent argument, you could click on that phrase and then have it suggest related phrases based on COCA data -- in this case, where there is a synonym of potent followed by argument. For example, it would list strong / persuasive / convincing argument (all of which are more common in COCA). It will show you the frequency of each phrase in COCA and you can click on any of these to see them in context in the corpus. In this way, it serves as a sort of "grammatical thesaurus" to find just the right phrase in English.



All of this is now available at http://www.wordandphrase.info/, along with the features that were there before, including the ability to browse through and search a huge frequency dictionary of English and see detailed information about any word. If you are interested in English words and phrases, their meaning, their frequency, and their distribution in different genres, we believe that this will be an exciting new resource. And as with all of our corpora, it is available for free.



Best,



Mark Davies



============================================

Mark Davies

Professor of (Corpus) Linguistics

Brigham Young University

(phone) 801-422-9168 / (fax) 801-422-0906



http://davies-linguistics.byu.edu



** Corpus design and use // Linguistic databases **

** Historical linguistics // Language variation **

** English, Spanish, and Portuguese **

============================================






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120213/f0787e83/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list