COHA/COCA meets Google Books
Neal Whitman
nwhitman at AMERITECH.NET
Fri May 13 00:43:43 UTC 2011
For those who may not have gotten this message directly:
> From: Mark Davies <mark_davies at BYU.EDU>
> Date: May 12, 2011 7:20:05 PM EDT
> To: CORPORA at LISTSERV.BYU.EDU
> Subject: 155 *billion* (155,000,000,000) word corpus of American English
> Reply-To: "Users of corpus.byu.edu" <CORPORA at LISTSERV.BYU.EDU>
>
> This email is being sent to people who 1) have registered for the corpora
> at http://corpus.byu.edu 2) have identified themselves as a "researcher"
> and 3) have used the corpora several times in the last few months.
>
> --------------------------------
>
> We’re pleased to announce a new corpus -- the Google Books (American
> English) corpus: http://googlebooks.byu.edu/.
>
> This corpus is based on the American English portion of the Google Books
> data (see http://ngrams.googlelabs.com and especially
> http://ngrams.googlelabs.com/datasets). It contains 155 *billion* words
> (155,000,000,000) in more than 1.3 million books from the 1810s-2000s
> (including 62 billion words from just 1980-2009).
>
> The corpus has most of the functionality of the other corpora from
> http://corpus.byu.edu (e.g. COCA, COHA, and our interface to the BNC),
> including: searching by part of speech, wildcards, and lemma (and thus
> advanced syntactic searches), synonyms, collocate searches, frequency by
> decade (tables listing each individual string, or charts for total
> frequency), comparisons of two historical periods (e.g. collocates
> of "women" or "music" in the 1800s and the 1900s), and more.
>
> This American English corpus is just one of seven Google Books-based
> corpora that we hope to create in the next year or two (contingent on
> funding, which we are applying for in June 2011). If funded, the other
> corpora will include British English, English from the 1500s-1700s, and
> corpora of Spanish, French, and German (see the listing at
> http://ngrams.googlelabs.com/datasets). Each of these corpora will be
> based on at least 50 billion words of data, and they should represent a
> nice addition to existing resources.
>
> The Google Books (American English) corpus is freely-available at
> http://googlebooks.byu.edu, and we hope that it is of value to you in your
> research and teaching.
>
> ============================================
> Mark Davies
> Professor of (Corpus) Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
> Web: http://davies-linguistics.byu.edu
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org
More information about the Ads-l
mailing list