Zero vs. "that" relatives (and TIME Corpus)

Mark Davies Mark_Davies at BYU.EDU
Tue Dec 30 14:40:14 UTC 2008


I've been in DIGEST mode over the holiday break, hence the delay in responding:

> > here it would be nice to have data from a source other than Time, to
> > find out whether the change was the result of changing editorial
> > practices at the magazine.

>> My feelings exactly. It might be hard to extrapolate the Time data to
>> journalistic usage more generally,

On the other hand ....

During the past year, I've had my students use the TIME Corpus (http://corpus.byu.edu/time) as part of papers they've written on 40-50 different syntactic / stylistic shifts in American English from the 1920s-2000s. These have covered a wide variety of topics -- modals (shall/will, will/going to, can/may), preposition stranding, several phenomena with verbal complementation, aspects of morphology (gender, plurals, +/-regular verbal forms), get vs be passives, progressives, subjunctive, etc etc etc (see list at http://davies-linguistics.byu.edu/elang325/project.asp). The data from the corpus has been quite useful. In most cases, it models very nicely what others have already found with smaller, "boutique" corpora.

In addition, though, I mentioned the following yesterday in a private email (which I didn't post directly to ADS-L):

The TIME corpus is more or less a stopgap, until a larger, more diverse, more balanced corpus of historical American English is available. I'm currently working on a 300 million word "Corpus of Historical American English" (COHA), which will complement the nearly 400 million word Corpus of Contemporary American English (COCA): http://www.americancorpus.org .
COHA will cover approximately 1810-present, and it will be balanced (for each decade, and therefore overall as well) between fiction, popular magazines, newspapers, and other non-fiction. Once completed, this will allow us to examine -- for the first time -- how specific changes have spread over time through different genres in American English. Thus the TIME corpus -- while quite useful for many things -- is more or less a stopgap for the 1900s, until COHA is completed.

============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list