[Corpora-List] (Google Books) n-grams

Octavian Popescu popescu at fbk.eu
Fri Jul 19 09:22:49 UTC 2013


Hi,

regarding Google N-gram, we think there is a large potential for research using this corpus.

I would recommend this paper:

"Behind the Times: Detecting Epoch Changes using Large Corpora"
Octavian Popescu and Carlo Strapparava
accepted at IJCNLP 2013, Nagoya, Japan

Thanks,
Best Regards,
Octavian

________________________________________
From: corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of Krishnamurthy, Ramesh [r.krishnamurthy at aston.ac.uk]
Sent: Friday, July 19, 2013 10:52 AM
To: Mark Davies
Cc: corpora at uib.no
Subject: Re: [Corpora-List] (Google Books) n-grams

I'm in total agreement, Mark! :)

I wasn't recommending Google n-grams as a resource.

The original query asked about uses of n-grams, and there have been several posts on Facebook
(including one or two of my own) about informal uses of Google n-grams, as well as links to more
formal research into changes in social, cultural, and political terms (perhaps more on the Computational
Linguistics group page than the Corpus Linguistics one)... so I just thought they deserved a mention...

I haven't yet accessed the BYU n-grams, but hope to do so in the near future! :)
best
Ramesh
________________________________________
From: Mark Davies [Mark_Davies at byu.edu]
Sent: 18 July 2013 17:22
To: Krishnamurthy, Ramesh
Cc: corpora at uib.no
Subject: (Google Books) n-grams

>> Access to Google n-grams seems to have sparked interest in studies into historical changes in social, cultural, and political values?

Problem is, the standard Google Books "n-grams" site (http://books.google.com/ngrams/) doesn't really do much with the n-grams themselves, except to search for *specific, exact phrases* inputted by the user. For example, it can't find the most common adjectives near "food", or the most common nouns near "fast".

At http://googlebooks.byu.edu/, though, much more of the potential of the Google Books n-grams data is available -- for research on historical and cultural shifts. For a number of examples, see: http://googlebooks.byu.edu/compare-googleBooks.asp.

And for those who want access to the n-grams from the 400 million word Corpus of Historical American English (http://corpus.byu.edu/coha/),  there are freely-available n-grams as well: http://www.ngrams.info/download_coha.asp. (This is in addition to the COCA n-grams: http://www.ngrams.info/).

Best,

Mark D.

============================================
Mark Davies
Professor of Linguistics / Brigham Young University
http://davies-linguistics.byu.edu/
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================

> -----Original Message-----
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf
> Of Krishnamurthy, Ramesh
> Sent: Thursday, July 18, 2013 10:02 AM
> To: cedric.krummes at uni-leipzig.de
> Cc: corpora at uib.no
> Subject: [Corpora-List] (no subject)
>
> Hi Cedric
>
>
>
> As we cannot be sure of the meaning or the part-of-speech of an item
>
> from a word frequency list, are not n-grams a sort of halfway house
>
> between word frequency lists and concordances?
>
>
>
> To me, n-grams is just one of the tools in the corpus linguistics toolbag,
>
> although it may be a relative newcomer, and hasn't grabbed the headlines
>
> like keywords, perhaps.
>
>
>
> If I remember correctly, at Cobuild, we first used bigrams for the BBC
>
> dictionary (published in 1992). I don't think n-grams was a feature of
>
> the earlier versions of WordSmith, and even in the more recent
>
> AntConc, the n-grams option is slightly hidden.
>
>
>
> Since the 1990s, I have used n-grams as a routine part of corpus
>
> analysis, if they are available in the software I am using at the time,
>
> for a variety of purposes (eg investigating language varieties in 'The
>
> Globalization of Business English?' at Complex 2001; investigating
>
> genre features in 'A corpus-based analysis of junk emails' at LREC
>
> 2002; and recently, to compare Business Spanish and Business French
>
> in research for the COMENEGO project).
>
>
>
> Access to Google n-grams seems to have sparked interest in studies
>
> into historical changes in social, cultural, and political values?
>
>
>
>
>
> best
>
> Ramesh
>
> -----------------------------------------------------------------------
>
> Date: Thu, 18 Jul 2013 09:51:30 +0200
> From: Cedric Krummes <cedric.krummes at uni-leipzig.de>
> Subject: [Corpora-List] Uses of N-grams?
> To: Corpora at uib.no
>
> Hello,
>
> Regarding n-grams (highly frequent word sequences like "on the other hand"
> or "why don't you"), does anybody any uses for them apart from language
> teaching.
>
> Most literature dealing with n-grams seems to apply them to foreign
> language teaching, second language acquisition, or English for X purposes.
> Any other uses?
>
> Best wishes,
>
> Cédric Krummes
> --
> Dr. Cédric Krummes
>
> Universität Leipzig · +49-341-97-37404
> http://www.cedrickrummes.org/contact.php
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list