Google and "culturomics"
Paul B. Gallagher
paulbg at PBG-TRANSLATIONS.COM
Wed Dec 22 09:08:52 UTC 2010
Several news media published stories on Friday about a new Google tool
<http://ngrams.googlelabs.com/> that allows the user to graph the
frequency of words and phrases (up to five words long) in a huge corpus
over time. See, for example:
<http://www.guardian.co.uk/science/2010/dec/16/culturomics-google-tool-cultural-trends>
<http://www.nytimes.com/2010/12/17/books/17words.html>
There's also a Language Log thread:
<http://languagelog.ldc.upenn.edu/nll/?p=2848>
And a Science article:
<http://www.sciencemag.org/content/early/2010/12/15/science.1199644>
Quantitative Analysis of Culture Using Millions of Digitized Books
Jean-Baptiste Michel et al.
Abstract: We constructed a corpus of digitized texts containing about 4%
of all books ever printed. Analysis of this corpus enables us to
investigate cultural trends quantitatively. We survey the vast terrain
of "culturomics", focusing on linguistic and cultural phenomena that
were reflected in the English language between 1800 and 2000. We show
how this approach can provide insights about fields as diverse as
lexicography, the evolution of grammar, collective memory, the adoption
of technology, the pursuit of fame, censorship, and historical
epidemiology. "Culturomics" extends the boundaries of rigorous
quantitative inquiry to a wide array of new phenomena spanning the
social sciences and the humanities.
Full text: <http://www.sciencemag.org/content/330/6011/1600.full.pdf>
Fair warning: this thing is addictive.
I asked the tool to plot "data is" vs. "data are," and found that the
plural usage peaked about 1983, but has tailed off since, while the
singular peaked about 1990 and has leveled off since, but surprisingly,
the singular usage is still about a third less common. A similar pattern
can be seen for "media" -- the singular usage is growing, but has not
caught up to the plural.
I also tried the Russian corpus, and learned that "на Украине" has
bounced around as Ukraine was more or less a topic of conversation, but
"в Украине" clung to the floor until about 1990, when it suddenly took
off, nearly catching its traditional counterpart in 1999 before falling
back to about half the latter's frequency.
--
War doesn't determine who's right, just who's left.
--
Paul B. Gallagher
pbg translations, inc.
"Russian Translations That Read Like Originals"
http://pbg-translations.com
-------------------------------------------------------------------------
Use your web browser to search the archives, control your subscription
options, and more. Visit and bookmark the SEELANGS Web Interface at:
http://seelangs.home.comcast.net/
-------------------------------------------------------------------------
More information about the SEELANG
mailing list