<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Hi Adam,<br>
      <br>
      In our previous work [1] done in the context of the Saffron
      project [2] we were interested in cross-domain evaluation of term
      extraction. Because we did not find other datasets similar to
      GENIA we relied on datasets annotated for keyphrase extraction [3]
      and index term assignment [4] which are more abundant. <br>
      <br>
      Regards,<br>
      Georgeta<br>
      <br>
      [1]
<a class="moz-txt-link-freetext" href="https://lipn.univ-paris13.fr/tia2013/Proceedings/actesTIA2013.pdf#page=59">https://lipn.univ-paris13.fr/tia2013/Proceedings/actesTIA2013.pdf#page=59</a><br>
      [2] <a class="moz-txt-link-freetext" href="http://saffron.deri.ie/">http://saffron.deri.ie/</a><br>
      [3] <a class="moz-txt-link-freetext" href="https://github.com/snkim/AutomaticKeyphraseExtraction">https://github.com/snkim/AutomaticKeyphraseExtraction</a><br>
      [4] <a class="moz-txt-link-freetext" href="http://code.google.com/p/maui-indexer/wiki/Resources">http://code.google.com/p/maui-indexer/wiki/Resources</a><br>
      <br>
      On 20/02/14 11:00, <a class="moz-txt-link-abbreviated" href="mailto:corpora-request@uib.no">corpora-request@uib.no</a> wrote:<br>
    </div>
    <blockquote cite="mid:mailman.17.1392894004.16522.corpora@uib.no"
      type="cite">
      <pre wrap="">Message: 2
Date: Wed, 19 Feb 2014 11:34:36 +0000
From: Adam Kilgarriff <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:adam@lexmasterclass.com"><adam@lexmasterclass.com></a>
Subject: [Corpora-List] Resources for evaluating term extraction
To: <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:corpora@hd.uib.no">"corpora@hd.uib.no"</a> <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="mailto:corpora@hd.uib.no"><corpora@hd.uib.no></a>

Dear all,

The Sketch Engine now supports term extraction for many languages - and we
want to evaluate it.

For that, we need domain corpora in which somebody has gone through
identifying all the 'true' terms.  Then we can compute our system's
precision and recall.

We are aware of GENIA, for English, and are using that already (key
citation here: A comparative evaluation of term recognition
algorithms<a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="http://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=VsRwsN8AAAAJ&citation_for_view=VsRwsN8AAAAJ:u5HHmVD_uO8C"><http://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=VsRwsN8AAAAJ&citation_for_view=VsRwsN8AAAAJ:u5HHmVD_uO8C></a>
 2008: Z Zhang, J Iria, CA Brewster, F Ciravegna)

Any corpus with "the terms it contains", conscientiously produced, will
help us.

Pointers please!

Adam

<div class="moz-txt-sig">-- 
========================================
Adam Kilgarriff <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="http://www.kilgarriff.co.uk/"><http://www.kilgarriff.co.uk/></a>
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:adam@lexmasterclass.com">adam@lexmasterclass.com</a>
Director                                    Lexical Computing
Ltd<a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="http://www.sketchengine.co.uk/"><http://www.sketchengine.co.uk/></a>

Visiting Research Fellow                 University of
Leeds<a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="http://leeds.ac.uk"><http://leeds.ac.uk></a>

<b class="moz-txt-star"><span class="moz-txt-tag">*</span>Corpora for all<span class="moz-txt-tag">*</span></b> with the Sketch Engine <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="http://www.sketchengine.co.uk"><http://www.sketchengine.co.uk></a>

                        *DANTE: a lexical database for English
<a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="http://www.webdante.com"><http://www.webdante.com></a>                  *
========================================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 2293 bytes
Desc: not available
URL: <a moz-do-not-send="true" class="moz-txt-link-rfc2396E" href="http://www.uib.no/mailman/public/corpora/attachments/20140219/9a68c358/attachment.txt"><http://www.uib.no/mailman/public/corpora/attachments/20140219/9a68c358/attachment.txt></a>

------------------------------</div></pre>
    </blockquote>
    <br>
  </body>
</html>