[Corpora-List] Questions for Google syntactic N-grams corpus

Wed Nov 13 08:44:21 UTC 2013

Hi,
dear all,

I
am extremely interested in the new edition of Google N-grams
corpus.My research topic is using the sentence dependence parsing skill to
mining the web scale textual corpus for semantics understanding.

And I want to ask two questions as following,

Q1: how to use this large scale data? Is there any existing
tools, e.g. indexing and search tools like lucene (maybe not available for this
big data)? Any other index tools?

Q2: I want to extract the typical triplets dependent
relations (S-V-O, e.g. "lion - chase - zebra"), could you help me for
how to do this efficiently?

Gang Tian | Phd StudentSchool of Information Technologies | Faculty of Engineering & ITTHE UNIVERSITY OF SYDNEY 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131113/a78a3293/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora