<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=iso-8859-1"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:SimSun;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:SimSun;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:"\@SimSun";
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"Microsoft YaHei";
panose-1:0 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"\@Microsoft YaHei";
panose-1:0 0 0 0 0 0 0 0 0 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:SimSun;
mso-fareast-language:ZH-CN;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:SimSun;
mso-fareast-language:ZH-CN;}
span.EstiloCorreo18
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 3.0cm 70.85pt 3.0cm;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=ES-MX link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Dear all,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Perhaps Gang’s use of the term “syntactic N-grams” is a bit misleading: this term was recently introduced in [1] etc. to mean n-grams in the syntactic metric, that is, where the words are adjacent syntactically instead of linearly (formally: small sub-trees of the dependency syntactic tree). They can be used wherever usual N-grams are used, and they are better than usual N-grams because they introduce syntactic information into machine learning.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Perhaps what Gang meant was that he wants to extract syntactic N-grams (SVO triples in his case) from conventional N-grams (Google corpus in his case).<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>[1] G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh, L. Chanona-Hernández. </span><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Syntactic Dependency-Based N-grams: More Evidence of Usefulness in Classification. CICLing 2013. LNCS 7816, pp. 13–24. <o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Grigori Sidorov<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><div style='border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt'><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal><b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> corpora-bounces@uib.no [mailto:corpora-bounces@uib.no] <b>On Behalf Of </b>tg<br><b>Sent:</b> Wednesday, November 13, 2013 2:44 AM<br><b>To:</b> corpora@uib.no<br><b>Subject:</b> [Corpora-List] Questions for Google syntactic N-grams corpus<o:p></o:p></span></p></div></div><p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p><div><p class=MsoNormal><a name="OLE_LINK57"></a><a name="OLE_LINK63"></a><a name="OLE_LINK58"></a><span lang=EN-US style='background:white'>Hi, dear all,</span><o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><a name="OLE_LINK59"><span lang=EN-US>I am extremely interested in the new edition of Google N-grams corpus.My research topic is using the sentence dependence parsing skill to mining the web scale textual corpus for semantics understanding.</span></a><o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><span lang=EN-US style='color:#444444'>And I want to ask two questions as following,</span><o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><span lang=EN-US style='color:#444444'>Q1: how to use this large scale data? Is there any existing tools, e.g. indexing and search tools like lucene (maybe not available for this big data)? Any other index tools?</span><o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal style='margin-bottom:12.0pt'><span lang=EN-US style='font-family:"Microsoft YaHei","serif";color:#444444'>Q2: I want to extract the typical triplets dependent relations (S-V-O, e.g. "lion - chase - zebra"), could you help me for how to do this efficiently?</span><span style='font-family:"Microsoft YaHei","serif"'><o:p></o:p></span></p><p class=MsoNormal style='line-height:16.35pt'><span style='font-size:11.5pt;font-family:"Microsoft YaHei","serif";color:#444444'>Gang Tian |</span><span style='font-size:11.5pt;color:#444444'> </span><span style='font-size:11.5pt;font-family:"Microsoft YaHei","serif";color:#444444'>Phd Student<o:p></o:p></span></p><p class=MsoNormal style='line-height:16.35pt'><span style='font-size:11.5pt;font-family:"Microsoft YaHei","serif";color:#444444'>School of Information Technologies | Faculty of Engineering & IT<o:p></o:p></span></p><p class=MsoNormal style='line-height:16.35pt'><span style='font-size:11.5pt;font-family:"Microsoft YaHei","serif";color:#444444'>THE UNIVERSITY OF SYDNEY<o:p></o:p></span></p></div></div></div></body></html>