[Corpora-List] API for accessing Google's English ngrams?

Ruvan Weerasinghe arw at ucsc.cmb.ac.lk
Tue Aug 9 02:42:28 UTC 2011


This is very nice. Is there any way to query this from a program using an API? Or any plans to do that? 




Ruvan Weerasinghe 
University of Colombo School of Computing 
Colombo 00700, 
Sri Lanka. 

Web: http://www.ucsc.lk 
Phone: +94112158953; Fax: +94112587239 




From: "Mark Davies" <Mark_Davies at byu.edu> 
To: "Ruvan Weerasinghe" <arw at ucsc.cmb.ac.lk>, Corpora at uib.no 
Sent: Tuesday, August 9, 2011 1:07:26 AM 
Subject: RE: [Corpora-List] API for accessing Google's English ngrams? 

Ruvan, 

>> Is there a way to query Google's English ngram data without downloading the files from http://ngrams.googlelabs.com/datasets? 
Or is there any other online data source from which ngram data for English can be accessed? 

For the American English dataset (155 billion words), you can use: http://googlebooks.byu.edu/ . 

This allows you to search by wildcard (for letters and words), part of speech, lemma, synonyms, and customized lists. You can also search for collocates, as well as compare n-grams across different time periods (e.g. adjectives with "woman" 1820s-1910s vs 1960s-2000s). You can also limit to just a subset of the n-grams data (e.g. just the 62 billion words of data from the 1980s-2000s). 

I've applied for a grant to do the same thing with the other Google Books datasets (e.g. British English, English back to the 1500s, One Million Books dataset, and Spanish, French, and German). 

Best, 

Mark D. 

============================================ 
Mark Davies 
Professor of Linguistics / Brigham Young University 
http://davies-linguistics.byu.edu 

** Corpus design and use // Linguistic databases ** 
** Historical linguistics // Language variation ** 
** English, Spanish, and Portuguese ** 
============================================ 


From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of Ruvan Weerasinghe [arw at ucsc.cmb.ac.lk] 
Sent: Monday, August 08, 2011 11:36 AM 
To: Corpora at uib.no 
Subject: [Corpora-List] API for accessing Google's English ngrams? 


Is there a way to query Google's English ngram data without downloading the files from http://ngrams.googlelabs.com/datasets? 
Or is there any other online data source from which ngram data for English can be accessed? 



RuvanWeerasinghe 
University of Colombo School of Computing 
Colombo 00700, 
Sri Lanka. 

Web: http://www.ucsc.lk 
Phone: +94112158953; Fax: +94112587239 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110809/5ed0d9a0/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list