[Corpora-List] (Google Books) n-grams

Mark Davies Mark_Davies at byu.edu
Thu Jul 18 16:22:30 UTC 2013


>> Access to Google n-grams seems to have sparked interest in studies into historical changes in social, cultural, and political values?

Problem is, the standard Google Books "n-grams" site (http://books.google.com/ngrams/) doesn't really do much with the n-grams themselves, except to search for *specific, exact phrases* inputted by the user. For example, it can't find the most common adjectives near "food", or the most common nouns near "fast".

At http://googlebooks.byu.edu/, though, much more of the potential of the Google Books n-grams data is available -- for research on historical and cultural shifts. For a number of examples, see: http://googlebooks.byu.edu/compare-googleBooks.asp.

And for those who want access to the n-grams from the 400 million word Corpus of Historical American English (http://corpus.byu.edu/coha/),  there are freely-available n-grams as well: http://www.ngrams.info/download_coha.asp. (This is in addition to the COCA n-grams: http://www.ngrams.info/).

Best,

Mark D.

============================================
Mark Davies
Professor of Linguistics / Brigham Young University
http://davies-linguistics.byu.edu/
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================

> -----Original Message-----
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf
> Of Krishnamurthy, Ramesh
> Sent: Thursday, July 18, 2013 10:02 AM
> To: cedric.krummes at uni-leipzig.de
> Cc: corpora at uib.no
> Subject: [Corpora-List] (no subject)
> 
> Hi Cedric
> 
> 
> 
> As we cannot be sure of the meaning or the part-of-speech of an item
> 
> from a word frequency list, are not n-grams a sort of halfway house
> 
> between word frequency lists and concordances?
> 
> 
> 
> To me, n-grams is just one of the tools in the corpus linguistics toolbag,
> 
> although it may be a relative newcomer, and hasn't grabbed the headlines
> 
> like keywords, perhaps.
> 
> 
> 
> If I remember correctly, at Cobuild, we first used bigrams for the BBC
> 
> dictionary (published in 1992). I don't think n-grams was a feature of
> 
> the earlier versions of WordSmith, and even in the more recent
> 
> AntConc, the n-grams option is slightly hidden.
> 
> 
> 
> Since the 1990s, I have used n-grams as a routine part of corpus
> 
> analysis, if they are available in the software I am using at the time,
> 
> for a variety of purposes (eg investigating language varieties in 'The
> 
> Globalization of Business English?' at Complex 2001; investigating
> 
> genre features in 'A corpus-based analysis of junk emails' at LREC
> 
> 2002; and recently, to compare Business Spanish and Business French
> 
> in research for the COMENEGO project).
> 
> 
> 
> Access to Google n-grams seems to have sparked interest in studies
> 
> into historical changes in social, cultural, and political values?
> 
> 
> 
> 
> 
> best
> 
> Ramesh
> 
> -----------------------------------------------------------------------
> 
> Date: Thu, 18 Jul 2013 09:51:30 +0200
> From: Cedric Krummes <cedric.krummes at uni-leipzig.de>
> Subject: [Corpora-List] Uses of N-grams?
> To: Corpora at uib.no
> 
> Hello,
> 
> Regarding n-grams (highly frequent word sequences like "on the other hand"
> or "why don't you"), does anybody any uses for them apart from language
> teaching.
> 
> Most literature dealing with n-grams seems to apply them to foreign
> language teaching, second language acquisition, or English for X purposes.
> Any other uses?
> 
> Best wishes,
> 
> Cédric Krummes
> --
> Dr. Cédric Krummes
> 
> Universität Leipzig · +49-341-97-37404
> http://www.cedrickrummes.org/contact.php
> 
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list