[Corpora-List] Questions for Google syntactic N-grams corpus

Wed Nov 13 14:05:00 UTC 2013

An alternate interface for the Google Books n-grams is at:

http://googlebooks.byu.edu/

This interface allows you to search by part of speech, lemma, synonyms, collocates, and to compare results across different portions of the n-grams datasets. For a comparison of this interface and the standard Google Books n-grams interface, see:

http://googlebooks.byu.edu/compare-googleBooks.asp

Also, just a few quick links to show what type of displays one can get from the data:

http://googlebooks.byu.edu/?c=us&q=26566890
All matching strings, by decade

http://googlebooks.byu.edu/?c=us&q=26566893
Overall frequency of all matching strings, by decade

http://googlebooks.byu.edu/?c=us&q=26566903
Matching strings for just one part of the corpus (here, 199os-2000)

Best,

Mark Davies

============================================
Mark Davies
Professor of Linguistics / Brigham Young University
http://davies-linguistics.byu.edu/

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================

________________________________
From: corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of tg [beijixingboy at hotmail.com]
Sent: Wednesday, November 13, 2013 1:44 AM
To: corpora at uib.no
Subject: [Corpora-List] Questions for Google syntactic N-grams corpus

Hi, dear all,

I am extremely interested in the new edition of Google N-grams corpus.My research topic is using the sentence dependence parsing skill to mining the web scale textual corpus for semantics understanding.

And I want to ask two questions as following,

Q1: how to use this large scale data? Is there any existing tools, e.g. indexing and search tools like lucene (maybe not available for this big data)? Any other index tools?

Q2: I want to extract the typical triplets dependent relations (S-V-O, e.g. "lion - chase - zebra"), could you help me for how to do this efficiently?

Gang Tian | Phd Student

School of Information Technologies | Faculty of Engineering & IT

THE UNIVERSITY OF SYDNEY
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131113/136ffb8a/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora