<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=iso-8859-1"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--

/* Font Definitions */

@font-face

        {font-family:SimSun;

        panose-1:2 1 6 0 3 1 1 1 1 1;}

@font-face

        {font-family:SimSun;

        panose-1:2 1 6 0 3 1 1 1 1 1;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:Tahoma;

        panose-1:2 11 6 4 3 5 4 4 2 4;}

@font-face

        {font-family:"\@SimSun";

        panose-1:2 1 6 0 3 1 1 1 1 1;}

@font-face

        {font-family:"Microsoft YaHei";

        panose-1:0 0 0 0 0 0 0 0 0 0;}

@font-face

        {font-family:"\@Microsoft YaHei";

        panose-1:0 0 0 0 0 0 0 0 0 0;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:SimSun;

        mso-fareast-language:ZH-CN;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

p

        {mso-style-priority:99;

        mso-margin-top-alt:auto;

        margin-right:0cm;

        mso-margin-bottom-alt:auto;

        margin-left:0cm;

        font-size:12.0pt;

        font-family:SimSun;

        mso-fareast-language:ZH-CN;}

span.EstiloCorreo18

        {mso-style-type:personal-reply;

        font-family:"Calibri","sans-serif";

        color:#1F497D;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:612.0pt 792.0pt;

        margin:70.85pt 3.0cm 70.85pt 3.0cm;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]--></head><body lang=ES-MX link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Dear all,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Perhaps Gang’s use of the term “syntactic N-grams” is a bit misleading: this term was recently introduced in [1] etc. to mean n-grams in the syntactic metric, that is, where the words are adjacent syntactically instead of linearly (formally: small sub-trees of the dependency syntactic tree). They can be used wherever usual N-grams are used, and they are better than usual N-grams because they introduce syntactic information into machine learning.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Perhaps what Gang meant was that he wants to extract syntactic N-grams (SVO triples in his case) from conventional N-grams (Google corpus in his case).<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>[1] G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh, L. Chanona-Hernández. </span><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Syntactic Dependency-Based N-grams: More Evidence of Usefulness in Classification. CICLing 2013. LNCS 7816, pp. 13–24. <o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Grigori Sidorov<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><div style='border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt'><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal><b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> corpora-bounces@uib.no [mailto:corpora-bounces@uib.no] <b>On Behalf Of </b>tg<br><b>Sent:</b> Wednesday, November 13, 2013 2:44 AM<br><b>To:</b> corpora@uib.no<br><b>Subject:</b> [Corpora-List] Questions for Google syntactic N-grams corpus<o:p></o:p></span></p></div></div><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><div><p class=MsoNormal><a name="OLE_LINK57"></a><a name="OLE_LINK63"></a><a name="OLE_LINK58"></a><span lang=EN-US style='background:white'>Hi, dear all,</span><o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><a name="OLE_LINK59"><span lang=EN-US>I am extremely interested in the new edition of Google N-grams corpus.My research topic is using the sentence dependence parsing skill to mining the web scale textual corpus for semantics understanding.</span></a><o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><span lang=EN-US style='color:#444444'>And I want to ask two questions as following,</span><o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><span lang=EN-US style='color:#444444'>Q1: how to use this large scale data? Is there any existing tools, e.g. indexing and search tools like lucene (maybe not available for this big data)? Any other index tools?</span><o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal style='margin-bottom:12.0pt'><span lang=EN-US style='font-family:"Microsoft YaHei","serif";color:#444444'>Q2: I want to extract the typical triplets dependent relations (S-V-O, e.g. "lion - chase - zebra"), could you help me for how to do this efficiently?</span><span style='font-family:"Microsoft YaHei","serif"'><o:p></o:p></span></p><p class=MsoNormal style='line-height:16.35pt'><span style='font-size:11.5pt;font-family:"Microsoft YaHei","serif";color:#444444'>Gang Tian |</span><span style='font-size:11.5pt;color:#444444'> </span><span style='font-size:11.5pt;font-family:"Microsoft YaHei","serif";color:#444444'>Phd Student<o:p></o:p></span></p><p class=MsoNormal style='line-height:16.35pt'><span style='font-size:11.5pt;font-family:"Microsoft YaHei","serif";color:#444444'>School of Information Technologies | Faculty of Engineering & IT<o:p></o:p></span></p><p class=MsoNormal style='line-height:16.35pt'><span style='font-size:11.5pt;font-family:"Microsoft YaHei","serif";color:#444444'>THE UNIVERSITY OF SYDNEY<o:p></o:p></span></p></div></div></div></body></html>