[Corpora-List] API for accessing Google's English ngrams?

Min-Yen Kan knmnyn at gmail.com
Tue Aug 9 02:26:03 UTC 2011


Hi Ruvan, all:

You might also try out Microsoft's Ngram data, which I believe is
available through web service calls (not 100% sure of this, haven't
tried it myself), and does not need (or allow?) to be downloaded.

http://research.microsoft.com/en-us/collaboration/focus/cs/web-ngram.aspx

Cheers,

Min

--
Min-Yen KAN (Dr) :: Associate Professor :: National University of
Singapore :: NUS School of Computing, AS6 05-12, 13 Computing Drive
Singapore 117417 :: 65-6516 1885(DID) :: 65-6779 4580 (Fax) ::
kanmy at comp.nus.edu.sg (E) :: www.comp.nus.edu.sg/~kanmy (W)

Important: This email is confidential and may be privileged. If you
are not the intended recipient, please delete it and notify us
immediately; you should not copy or use it for any purpose, nor
disclose its contents to any other person. Thank you.



On Tue, Aug 9, 2011 at 3:37 AM, Mark Davies <Mark_Davies at byu.edu> wrote:
> Ruvan,
>
>>> Is there a way to query Google's English ngram data without downloading the files from http://ngrams.googlelabs.com/datasets?
> Or is there any other online data source from which ngram data for English can be accessed?
>
> For the American English dataset (155 billion words), you can use: http://googlebooks.byu.edu/ .
>
> This allows you to search by wildcard (for letters and words), part of speech, lemma, synonyms, and customized lists. You can also search for collocates, as well as compare n-grams across different time periods (e.g. adjectives with "woman" 1820s-1910s vs 1960s-2000s). You can also limit to just a subset of the n-grams data (e.g. just the 62 billion words of data from the 1980s-2000s).
>
> I've applied for a grant to do the same thing with the other Google Books datasets (e.g. British English, English back to the 1500s, One Million Books dataset, and Spanish, French, and German).
>
> Best,
>
> Mark D.
>
> ============================================
> Mark Davies
> Professor of Linguistics / Brigham Young University
> http://davies-linguistics.byu.edu
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
>
>
> From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of Ruvan Weerasinghe [arw at ucsc.cmb.ac.lk]
> Sent: Monday, August 08, 2011 11:36 AM
> To: Corpora at uib.no
> Subject: [Corpora-List] API for accessing Google's English ngrams?
>
>
> Is there a way to query Google's English ngram data without downloading the files from http://ngrams.googlelabs.com/datasets?
> Or is there any other online data source from which ngram data for English can be accessed?
>
>
>
>  RuvanWeerasinghe
> University of Colombo School of Computing
> Colombo 00700,
> Sri Lanka.
>
> Web:    http://www.ucsc.lk
> Phone:  +94112158953; Fax:    +94112587239
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list