Linguistic dark matter
Arnold Zwicky
zwicky at STANFORD.EDU
Fri Dec 17 16:14:34 UTC 2010
On Dec 17, 2010, at 3:13 AM, Michael Quinion wrote:
> Science reports on a massive searchable corpus created from some five
> million books, now available on Google: http://ngrams.googlelabs.com/
>
> One report is here: http://bit.ly/ffQCmR . It quotes the researchers:
>
> "We estimated that 52% of the English lexicon - the majority of words used
> in English books - consist of lexical 'dark matter' undocumented in
> standard references."
GN, 12/16/10: Humanities research with the Google books corpus:
http://languagelog.ldc.upenn.edu/nll/?p=2847
which doesn't, however, take up the issue of "linguistic dark matter", though more postings are to come.
arnold
------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org
More information about the Ads-l
mailing list