Zero vs. "that" relatives (and TIME Corpus) (UNCLASSIFIED)

Tue Dec 30 17:50:09 UTC 2008

Classification:  UNCLASSIFIED
Caveats: NONE

Just curious, how many words is the TIME corpus?

> -----Original Message-----
> From: American Dialect Society
> [mailto:ADS-L at LISTSERV.UGA.EDU] On Behalf Of Mark Davies
> Sent: Tuesday, December 30, 2008 8:40 AM
> To: ADS-L at LISTSERV.UGA.EDU
> Subject: Re: Zero vs. "that" relatives (and TIME Corpus)
>
> ---------------------- Information from the mail header
> -----------------------
> Sender:       American Dialect Society <ADS-L at LISTSERV.UGA.EDU>
> Poster:       Mark Davies <Mark_Davies at BYU.EDU>
> Subject:      Re: Zero vs. "that" relatives (and TIME Corpus)
> --------------------------------------------------------------
> -----------------
>
> I've been in DIGEST mode over the holiday break, hence the
> delay in responding:
>
> > > here it would be nice to have data from a source other
> than Time, to
> > > find out whether the change was the result of changing editorial
> > > practices at the magazine.
>
> >> My feelings exactly. It might be hard to extrapolate the
> Time data to
> >> journalistic usage more generally,
>
> On the other hand ....
>
> During the past year, I've had my students use the TIME
> Corpus (http://corpus.byu.edu/time) as part of papers they've
> written on 40-50 different syntactic / stylistic shifts in
> American English from the 1920s-2000s. These have covered a
> wide variety of topics -- modals (shall/will, will/going to,
> can/may), preposition stranding, several phenomena with
> verbal complementation, aspects of morphology (gender,
> plurals, +/-regular verbal forms), get vs be passives,
> progressives, subjunctive, etc etc etc (see list at
> http://davies-linguistics.byu.edu/elang325/project.asp). The
> data from the corpus has been quite useful. In most cases, it
> models very nicely what others have already found with
> smaller, "boutique" corpora.
>
> In addition, though, I mentioned the following yesterday in a
> private email (which I didn't post directly to ADS-L):
>
> The TIME corpus is more or less a stopgap, until a larger,
> more diverse, more balanced corpus of historical American
> English is available. I'm currently working on a 300 million
> word "Corpus of Historical American English" (COHA), which
> will complement the nearly 400 million word Corpus of
> Contemporary American English (COCA): http://www.americancorpus.org .
> COHA will cover approximately 1810-present, and it will be
> balanced (for each decade, and therefore overall as well)
> between fiction, popular magazines, newspapers, and other
> non-fiction. Once completed, this will allow us to examine --
> for the first time -- how specific changes have spread over
> time through different genres in American English. Thus the
> TIME corpus -- while quite useful for many things -- is more
> or less a stopgap for the 1900s, until COHA is completed.
>
> ============================================
> Mark Davies
> Professor of (Corpus) Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
> Web: davies-linguistics.byu.edu
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
>
> ------------------------------------------------------------
> The American Dialect Society - http://www.americandialect.org
>
Classification:  UNCLASSIFIED
Caveats: NONE

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org