<div dir="ltr"><div class="gmail_quote"><div dir="ltr"><div>Dear all,</div><div><br></div><div>The Sketch Engine now supports term extraction for many languages - and we want to evaluate it.</div><div><br></div><div>For that, we need domain corpora in which somebody has gone through identifying all the 'true' terms.  Then we can compute our system's precision and recall.</div>


<div><br></div><div>We are aware of GENIA, for English, and are using that already (key citation here:<font color="#000000"> <a href="http://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=VsRwsN8AAAAJ&citation_for_view=VsRwsN8AAAAJ:u5HHmVD_uO8C" style="text-decoration:none;font-family:Arial,sans-serif;background-color:rgb(232,244,247)" target="_blank">A comparative evaluation of term recognition algorithms</a> 2008: </font><span style="line-height:16.1200008392334px;background-color:rgb(232,244,247);font-family:Arial,sans-serif">Z Zhang, J Iria, CA Brewster, F Ciravegna) </span></div>


<div><br></div><div>Any corpus with "the terms it contains", conscientiously produced, will help us.</div><div><br></div><div>Pointers please!</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>

Adam<br clear="all"><div><br></div>-- <br>========================================<br>
<a href="http://www.kilgarriff.co.uk/" target="_blank">Adam Kilgarriff</a>                  <a href="mailto:adam@lexmasterclass.com" target="_blank">adam@lexmasterclass.com</a>                                             <br>


Director                                    <a href="http://www.sketchengine.co.uk/" target="_blank">Lexical Computing Ltd</a>                <br>Visiting Research Fellow                 <a href="http://leeds.ac.uk" target="_blank">University of Leeds</a>     <div>


<i><font color="#006600">Corpora for all</font></i> with <a href="http://www.sketchengine.co.uk" target="_blank">the Sketch Engine</a>                 </div><div>                        <i><a href="http://www.webdante.com" target="_blank">DANTE: <font color="#009900">a lexical database for English</font></a><font color="#009900"> </font>                 </i><div>


========================================</div></div>
</div></font></span></div>
</div><br><br>
</div>