[Corpora-List] On the term "(dependency) syntactic N-grams"

Grigori Sidorov sidorov at cic.ipn.mx
Fri Nov 15 20:13:01 UTC 2013


Dear all,

 

Perhaps Gang’s use of the term “syntactic N-grams” is a bit misleading: this
term was recently introduced in [1] etc. to mean n-grams in the syntactic
metric, that is, where the words are adjacent syntactically instead of
linearly (formally: small sub-trees of the dependency syntactic tree). They
can be used wherever usual N-grams are used, and they are better than usual
N-grams because they introduce syntactic information into machine learning.

 

Perhaps what Gang meant was that he wants to extract syntactic N-grams (SVO
triples in his case) from conventional N-grams (Google corpus in his case).

 

[1] G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh, L.
Chanona-Hernández. Syntactic Dependency-Based N-grams: More Evidence of
Usefulness in Classification. CICLing 2013. LNCS 7816, pp. 13–24. 

 

Grigori Sidorov

 

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of tg
Sent: Wednesday, November 13, 2013 2:44 AM
To: corpora at uib.no
Subject: [Corpora-List] Questions for Google syntactic N-grams corpus

 

Hi, dear all,

 

I am extremely interested in the new edition of Google N-grams corpus.My
research topic is using the sentence dependence parsing skill to mining the
web scale textual corpus for semantics understanding.

 

And I want to ask two questions as following,

 

Q1: how to use this large scale data? Is there any existing tools, e.g.
indexing and search tools like lucene (maybe not available for this big
data)? Any other index tools?

 

Q2: I want to extract the typical triplets dependent relations (S-V-O, e.g.
"lion - chase - zebra"), could you help me for how to do this efficiently?

Gang Tian | Phd Student

School of Information Technologies | Faculty of Engineering & IT

THE UNIVERSITY OF SYDNEY

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131115/c02752ad/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list