Fwd: [Corpora-List] Corpus of Historical American English (400 million words, 1810-2009)

Wed Sep 8 23:50:08 UTC 2010

FYI.

Barbara

Begin forwarded message:

> From: Mark Davies <Mark_Davies at byu.edu>
> Date: 8 September 2010 9:53:39 AM EDT
> To: "corpora at uib.no" <corpora at uib.no>
> Subject: [Corpora-List] Corpus of *Historical* American English (400
> million words, 1810-2009)
>
> Note: some users of the corpora at http://corpus.byu.edu/ may
> already be aware of the following corpus, but we want to announce it
> here for the general community of corpus linguists.
>
> -----------------------
>
> We are pleased to announce the release of the 400 million word
> Corpus of Historical American English (1810-2009). The corpus has
> been funded by a generous grant from the US National Endowment for
> the Humanities, and it is freely available at http://corpus.byu.edu/coha/
> . COHA is the largest structured corpus of historical English, and
> it contains more than 100,000 texts from fiction, popular magazines,
> newspapers, and non-fiction books, with the same genre balance
> decade by decade from the 1810s-2000s.
>
> COHA is also related to other large corpora that we have created or
> modified, including the 410 million word Corpus of Contemporary
> American English (COCA), the 100 million word TIME Magazine Corpus
> (1920s-2000s), the 100 million word British National Corpus (our
> architecture and interface), the 100 million word NEH-funded Corpus
> del Español (1200s-1900s), and the NEH-funded 45 million word Corpus
> do Português (1300s-1900s). For information on these corpora, see http://corpus.byu.edu
> .
>
> COHA allows you to quickly and easily search the 400 million words
> of text from the 1810s-2000s to see how words, phrases and
> grammatical constructions have increased or decreased in frequency,
> how words have changed meaning over time, and how stylistic changes
> have taken place in the language. Users can see the overall
> (normalized) frequency by decade and year, as well as the frequency
> of each matching string, by decade.
>
> The following are just a small sample of an unlimited number of
> queries, but they should give some idea of what the corpus can do.
>
> * Lexical change: the rise and fall of words and phrases like the
> following:
> - (decrease since the 1800s): bosom, folly, grieved, bestow*,
> quaint, beauteous, fellow, sublime, lad, many a time, of no little,
> for (conj)
> - (an increase and then decrease): mustn't, naughty, boyish, agog,
> toddle, far-out, famed, wangle, swell (adj), lousy
> - (an increase to the present time): a lot of, unleash, sexual, calm
> down, screw up, freak out, mommy, skills, frustrating
> - (words reflecting historical and cultural shifts): emancipation,
> steamship, telegraph, flapper*, fascis*, teenage*, communis*, global
> warming
>
> * Stylistic change (which gives the flavor of a different time
> period). Examples from the 1800s, which have decreased since then,
> are: [so ADJ as to V] (so good as to show me), [PRON be but] (they
> are but the last examples), [have quite V-ed] (until she had quite
> finished), [NOUN be that of] (her dress was that of a beggar), or [a
> most ADJ NOUN] (a most helpful child).
>
> * Morphological change: which show how word roots, prefixes, and
> suffixes have been used over time, including comparisons between
> different periods, such as -heart- (1800s noble-hearted, 1900s heart-
> stopping), home- (1800s homebred, 1900s homeowner), or -able
> adjectives (1800s placable, 1900s predictable).
>
> * Syntactic change (since the corpus is tagged and lemmatized), like
> [end up V-ing], [going to V], [V PRON into V-ing] (e.g. talked them
> into going), phrasal verbs with [up] (e.g. make up, show up), post-
> verbal negation with [need] (needn't mention), the "get" passive
> (get hired), sentence-initial "hopefully", and semi-modals like
> [need to] and [have to].
>
> * Semantic change: how the meaning or usage of words have changed
> over time, by looking at changes in collocates (co-occurring words),
> like [sexual, gay, chip, engine, or web]. This can also signal
> cultural changes over time, such as nouns used with [woman] in the
> 1930s-50s compared to the 1960s-80s (fabrics, hips // liberation,
> abortion), or nouns used with [problem] in the 1810s-1920s compared
> to the 1920s-2000s (railway, trust // drugs, pollution).
>
> * Lexical change (again): users can also have the corpus generate a
> list of words that were used more in one period than another, even
> when they don't know what the specified words might be. For example,
> the corpus can generate lists verbs in the 1970s-2000s compared to
> the 1930s-1960s (download, recycle // effectuate, redound),
> adjectives in the 1970s-2000s and the 1930s-1960s (online,
> affordable // leftist, communistic), or -ly adverbs in the 1900s to
> the 1800s (basically, reportedly // despondingly, sportively).
>
> As can be seen, the corpus allows research on a wide range of
> phenomena in a robust 400 million word corpus from the last two
> centuries of American English. The corpus is freely available at http://corpus.byu.edu/coha/
> , and we invite you to use it for your research and teaching.
>
> ============================================
> Mark Davies
> Professor of (Corpus) Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
>
> http://davies-linguistics.byu.edu
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org

Fwd: [Corpora-List] Corpus of *Historical* American English (400 million words, 1810-2009)

Fwd: [Corpora-List] Corpus of Historical American English (400 million words, 1810-2009)