Zero vs. "that" relatives (and TIME Corpus)
Mark Davies
Mark_Davies at BYU.EDU
Tue Dec 30 18:50:18 UTC 2008
> Just curious, how many words is the TIME corpus?
100+ million words, 1920s-2000s.
Of course there are larger *text archives* (Google Books, NY Times, other newspapers, etc). But all of these have very limited architectures and interfaces:
-- find the first occurrence of a word
-- show all 18,489 occurrences of a word (one ... by ... one)
-- etc etc
None of those text archives can really do things like:
-- (easily) see the frequency over time (decade by decade, year by year)
-- use part of speech or lemmatization (thus pretty limited for syntactic change)
-- wildcards; see all matching forms (thus pretty limited for morphological change)
-- collocates (thus pretty limited for semantic change)
-- use the frequency in different historical periods as part of the query (e.g. collocates of Word X in Time Y vs Time Z)
The TIME Corpus can do all of these.
Of course, it is just one source in just one genre -- hence the need for something like the Corpus of Historical American English.
============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================
------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org
More information about the Ads-l
mailing list