Linguistic dark matter

Ben Zimmer bgzimmer at BABEL.LING.UPENN.EDU
Fri Dec 17 15:14:52 UTC 2010

On Fri, Dec 17, 2010 at 9:14 AM, Michael Quinion
<wordseditor at> wrote:
> David Barnhart wrote
> > If you haven't noticed I'm skeptical of the "tool".
> I'm certainly sceptical of that 52% "undocumented in standard references",
> which was why I quoted that sentence. The figure seems extremely high. As
> I can't get access to the Science article (which is only fee online to
> subscribers), I can't begin to work out its basis.
> The researchers seem not to have applied many lexical filters. Proper
> names are included, because they want the corpus to be a cultural tool as
> well as a lexicographical one. Similarly, they allow scientific names
> ("Turdus merula" and the like). I would have thought that - if the
> "standard references" are restricted to general dictionaries - proper and
> scientific names would account for a big part of that missing 52%.

To be fair, proper nouns were included in the researchers' overall
lexical count, but the "dark matter" is not 52% of that number. They
did filter out proper nouns of that part of the analysis, since they
were going for an apples-to-apples comparison with the OED and
Webster's Third. The media coverage doesn't get into these subtleties,
of course.


Ben Zimmer

The American Dialect Society -

More information about the Ads-l mailing list