[Corpora-List] API for accessing Google's English ngrams?

Mark Davies Mark_Davies at byu.edu
Mon Aug 8 19:37:26 UTC 2011


Ruvan,

>> Is there a way to query Google's English ngram data without downloading the files from http://ngrams.googlelabs.com/datasets?
Or is there any other online data source from which ngram data for English can be accessed?

For the American English dataset (155 billion words), you can use: http://googlebooks.byu.edu/ .

This allows you to search by wildcard (for letters and words), part of speech, lemma, synonyms, and customized lists. You can also search for collocates, as well as compare n-grams across different time periods (e.g. adjectives with "woman" 1820s-1910s vs 1960s-2000s). You can also limit to just a subset of the n-grams data (e.g. just the 62 billion words of data from the 1980s-2000s).

I've applied for a grant to do the same thing with the other Google Books datasets (e.g. British English, English back to the 1500s, One Million Books dataset, and Spanish, French, and German).

Best,

Mark D.

============================================
Mark Davies
Professor of Linguistics / Brigham Young University
http://davies-linguistics.byu.edu
 
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================


From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of Ruvan Weerasinghe [arw at ucsc.cmb.ac.lk]
Sent: Monday, August 08, 2011 11:36 AM
To: Corpora at uib.no
Subject: [Corpora-List] API for accessing Google's English ngrams?


Is there a way to query Google's English ngram data without downloading the files from http://ngrams.googlelabs.com/datasets?
Or is there any other online data source from which ngram data for English can be accessed?



 RuvanWeerasinghe
University of Colombo School of Computing
Colombo 00700,
Sri Lanka.

Web:    http://www.ucsc.lk
Phone:  +94112158953; Fax:    +94112587239
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list