<div dir="ltr">Apologies - I missed out<div><br></div><div>6.  <span style="font-family:arial,sans-serif;font-size:13px;white-space:nowrap">María José Marín Pérez has created a corpus of legal English (BLARC) and has used it for extensive term-extraction </span></div>


<div><span style="font-family:arial,sans-serif;font-size:13px;white-space:nowrap">experiments, and can provide both the corpus and the lists of terms (ppaer submitted to COLING)</span></div><div><span style="font-family:arial,sans-serif;font-size:13px;white-space:nowrap"><br>


</span></div><div><span style="font-family:arial,sans-serif;font-size:13px;white-space:nowrap">Adam</span></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 24 February 2014 10:16, Adam Kilgarriff <span dir="ltr"><<a href="mailto:adam@lexmasterclass.com" target="_blank">adam@lexmasterclass.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Dear all<div><br></div><div>here is a summary of responses to my request for resources for evaluating term extraction.</div>


<div><br></div><div>1.  TTC project has prepared corpora and terms for 2 domains and seven languages: see</div>

<blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">





</blockquote><a href="http://www.lina.univ-nantes.fr/?Reference-Term-Lists-of-TTC.html" target="_blank">http://www.lina.univ-nantes.<u></u>fr/?Reference-Term-Lists-of-<u></u>TTC.html</a><br></div></div></div></blockquote>




Thanks to Anne Schumann <br><br>2  ACL Anthology corpus has been marked up with "valid terms" and "technology terms".<br>Thanks to Behrang Qasemizadeh <br><br><div class="gmail_extra"><div class="gmail_quote">




<div>3. Georgeta Bordea says:</div></div></div><div><div class="gmail_extra"><div class="gmail_quote"><div><span style="font-size:13px;font-family:arial,sans-serif">In our previous work [1] done in the context of the Saffron project [2] we were interested in cross-domain evaluation of </span><span style="font-size:13px;background-color:rgb(255,255,204);font-family:arial,sans-serif">term </span><span style="font-size:13px;font-family:arial,sans-serif"></span><span style="font-size:13px;background-color:rgb(255,255,204);font-family:arial,sans-serif">extraction</span><span style="font-size:13px;font-family:arial,sans-serif">. Because we did not find other datasets similar to GENIA we relied on datasets annotated </span><span style="font-size:13px;background-color:rgb(255,255,204);font-family:arial,sans-serif">for</span><span style="font-size:13px;font-family:arial,sans-serif"> keyphrase </span><span style="font-size:13px;background-color:rgb(255,255,204);font-family:arial,sans-serif">extraction</span><span style="font-size:13px;font-family:arial,sans-serif"> [3] and index </span><span style="font-size:13px;background-color:rgb(255,255,204);font-family:arial,sans-serif">term</span><span style="font-size:13px;font-family:arial,sans-serif"> assignment [4] which are more abundant. </span></div>




</div></div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><span style="font-size:13px;font-family:arial,sans-serif">[1] </span><a href="https://lipn.univ-paris13.fr/tia2013/Proceedings/actesTIA2013.pdf#page=59" style="font-size:13px;font-family:arial,sans-serif" target="_blank">https://lipn.univ-paris13.fr/tia2013/Proceedings/actesTIA2013.pdf#page=59<br>




</a><span style="font-size:13px;font-family:arial,sans-serif">[2] </span><a href="http://saffron.deri.ie/" style="font-size:13px;font-family:arial,sans-serif" target="_blank">http://saffron.deri.ie/<br></a><span style="font-size:13px;font-family:arial,sans-serif">[3] </span><a href="https://github.com/snkim/AutomaticKeyphraseExtraction" style="font-size:13px;font-family:arial,sans-serif" target="_blank">https://github.com/snkim/AutomaticKeyphraseExtraction</a><div>




<div class="gmail_extra"><div class="gmail_quote"><span style="font-family:arial,sans-serif;font-size:13px">[4] </span><a href="http://code.google.com/p/maui-indexer/wiki/Resources" style="font-family:arial,sans-serif;font-size:13px" target="_blank">http://code.google.com/p/maui-indexer/wiki/Resources</a></div>




</div></div></blockquote><div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;border-right-width:1px;border-right-color:rgb(204,204,204);border-right-style:solid;padding-left:1ex;padding-right:1ex">




</blockquote><div><br></div><div>4. Kevin Cohen and Sophia Ananiadou pointed to resources related to the Termine tool: however they did not include reference lists of 'gold standard' terms.</div><div><br></div><div>




5. Viktor Pekar pointed to a SemEval task which included "aspect term extraction" in the domain of restaurant reviews, by which they mean "service" and "staff" in the sentence "I liked the service and the staff".  see <a href="http://alt.qcri.org/semeval2014/task4/" style="font-size:13px;font-family:arial,sans-serif" target="_blank">http://alt.qcri.org/semeval2014/task4/</a><span style="font-size:13px;font-family:arial,sans-serif">  </span>This wasn't quite what we were looking for.</div>




<div><br></div><div>Thanks all</div><div><br></div><div>Adam</div><div><br></div><div>========original post====================</div><div><span style="font-family:arial,sans-serif;font-size:13px">Date: Wed, 19 Feb 2014 11:34:36 +0000</span><br style="font-family:arial,sans-serif;font-size:13px">




<span style="font-family:arial,sans-serif;font-size:13px">Subject: [Corpora-List] Resources </span><span style="background-color:rgb(255,255,204);font-family:arial,sans-serif;font-size:13px">for</span><span style="font-family:arial,sans-serif;font-size:13px"> </span><span style="background-color:rgb(255,255,204);font-family:arial,sans-serif;font-size:13px">evaluating</span><span style="font-family:arial,sans-serif;font-size:13px"> </span><span style="background-color:rgb(255,255,204);font-family:arial,sans-serif;font-size:13px">term</span><span style="font-family:arial,sans-serif;font-size:13px"> </span><span style="background-color:rgb(255,255,204);font-family:arial,sans-serif;font-size:13px">extraction</span><br style="font-family:arial,sans-serif;font-size:13px">




<br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">Dear all,</span><br style="font-family:arial,sans-serif;font-size:13px"><br style="font-family:arial,sans-serif;font-size:13px">




<span style="font-family:arial,sans-serif;font-size:13px">The Sketch Engine now supports </span><span style="background-color:rgb(255,255,204);font-family:arial,sans-serif;font-size:13px">term</span><span style="font-family:arial,sans-serif;font-size:13px"> </span><span style="background-color:rgb(255,255,204);font-family:arial,sans-serif;font-size:13px">extraction</span><span style="font-family:arial,sans-serif;font-size:13px"> </span><span style="background-color:rgb(255,255,204);font-family:arial,sans-serif;font-size:13px">for</span><span style="font-family:arial,sans-serif;font-size:13px"> many languages - and we</span><br style="font-family:arial,sans-serif;font-size:13px">




<span style="font-family:arial,sans-serif;font-size:13px">want to evaluate it.</span><br style="font-family:arial,sans-serif;font-size:13px"><br style="font-family:arial,sans-serif;font-size:13px"><span style="background-color:rgb(255,255,204);font-family:arial,sans-serif;font-size:13px">For</span><span style="font-family:arial,sans-serif;font-size:13px"> that, we need domain corpora in which somebody has gone through</span><br style="font-family:arial,sans-serif;font-size:13px">




<span style="font-family:arial,sans-serif;font-size:13px">identifying all the 'true' terms.  Then we can compute our system's</span><br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">precision and recall.</span><br style="font-family:arial,sans-serif;font-size:13px">




<br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">We are aware of GENIA, </span><span style="background-color:rgb(255,255,204);font-family:arial,sans-serif;font-size:13px">for</span><span style="font-family:arial,sans-serif;font-size:13px"> English, and are using that already (key</span><br style="font-family:arial,sans-serif;font-size:13px">




<span style="font-family:arial,sans-serif;font-size:13px">citation here: A comparative evaluation of </span><span style="background-color:rgb(255,255,204);font-family:arial,sans-serif;font-size:13px">term</span><span style="font-family:arial,sans-serif;font-size:13px"> recognition</span><br style="font-family:arial,sans-serif;font-size:13px">




<span style="font-family:arial,sans-serif;font-size:13px">algorithms</span><span style="font-family:arial,sans-serif;font-size:13px">. 2008: Z Zhang, J Iria, CA Brewster, F Ciravegna)</span><br style="font-family:arial,sans-serif;font-size:13px">




<br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">Any corpus with "the terms it contains", conscientiously produced, will</span><br style="font-family:arial,sans-serif;font-size:13px">




<span style="font-family:arial,sans-serif;font-size:13px">help us.</span><br style="font-family:arial,sans-serif;font-size:13px"><br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">Pointers please!</span><span><font color="#888888"><br style="font-family:arial,sans-serif;font-size:13px">




<br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">Adam</span><br style="font-family:arial,sans-serif;font-size:13px"></font></span></div></div><span><font color="#888888">-- <br>


========================================<br>

<a href="http://www.kilgarriff.co.uk/" target="_blank">Adam Kilgarriff</a>                  <a href="mailto:adam@lexmasterclass.com" target="_blank">adam@lexmasterclass.com</a>                                             <br>




Director                                    <a href="http://www.sketchengine.co.uk/" target="_blank">Lexical Computing Ltd</a>                <br>Visiting Research Fellow                 <a href="http://leeds.ac.uk" target="_blank">University of Leeds</a>     <div>




<i><font color="#006600">Corpora for all</font></i> with <a href="http://www.sketchengine.co.uk" target="_blank">the Sketch Engine</a>                 </div><div>                        <i><a href="http://www.webdante.com" target="_blank">DANTE: <font color="#009900">a lexical database for English</font></a><font color="#009900"> </font>                 </i><div>




========================================</div></div>
</font></span></div></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>========================================<br><a href="http://www.kilgarriff.co.uk/" target="_blank">Adam Kilgarriff</a>                  <a href="mailto:adam@lexmasterclass.com" target="_blank">adam@lexmasterclass.com</a>                                             <br>


Director                                    <a href="http://www.sketchengine.co.uk/" target="_blank">Lexical Computing Ltd</a>                <br>Visiting Research Fellow                 <a href="http://leeds.ac.uk" target="_blank">University of Leeds</a>     <div>


<i><font color="#006600">Corpora for all</font></i> with <a href="http://www.sketchengine.co.uk" target="_blank">the Sketch Engine</a>                 </div><div>                        <i><a href="http://www.webdante.com" target="_blank">DANTE: <font color="#009900">a lexical database for English</font></a><font color="#009900"> </font>                 </i><div>


========================================</div></div>
</div></div>