[Corpora-List] Resources for evaluating term extraction

Thu Feb 20 11:46:10 UTC 2014

Hi Adam,

In our previous work [1] done in the context of the Saffron project [2] 
we were interested in cross-domain evaluation of term extraction. 
Because we did not find other datasets similar to GENIA we relied on 
datasets annotated for keyphrase extraction [3] and index term 
assignment [4] which are more abundant.

Regards,
Georgeta

[1] 
https://lipn.univ-paris13.fr/tia2013/Proceedings/actesTIA2013.pdf#page=59
[2] http://saffron.deri.ie/
[3] https://github.com/snkim/AutomaticKeyphraseExtraction
[4] http://code.google.com/p/maui-indexer/wiki/Resources

On 20/02/14 11:00, corpora-request at uib.no wrote:
> Message: 2
> Date: Wed, 19 Feb 2014 11:34:36 +0000
> From: Adam Kilgarriff<adam at lexmasterclass.com>
> Subject: [Corpora-List] Resources for evaluating term extraction
> To:"corpora at hd.uib.no"  <corpora at hd.uib.no>
>
> Dear all,
>
> The Sketch Engine now supports term extraction for many languages - and we
> want to evaluate it.
>
> For that, we need domain corpora in which somebody has gone through
> identifying all the 'true' terms.  Then we can compute our system's
> precision and recall.
>
> We are aware of GENIA, for English, and are using that already (key
> citation here: A comparative evaluation of term recognition
> algorithms<http://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=VsRwsN8AAAAJ&citation_for_view=VsRwsN8AAAAJ:u5HHmVD_uO8C>
>   2008: Z Zhang, J Iria, CA Brewster, F Ciravegna)
>
> Any corpus with "the terms it contains", conscientiously produced, will
> help us.
>
> Pointers please!
>
> Adam
>
> -- ======================================== Adam Kilgarriff 
> <http://www.kilgarriff.co.uk/> adam at lexmasterclass.com Director 
> Lexical Computing Ltd<http://www.sketchengine.co.uk/> Visiting 
> Research Fellow University of Leeds<http://leeds.ac.uk> *Corpora for 
> all* with the Sketch Engine <http://www.sketchengine.co.uk> *DANTE: a 
> lexical database for English <http://www.webdante.com> * 
> ======================================== -------------- next part 
> -------------- A non-text attachment was scrubbed... Name: not 
> available Type: text/html Size: 2293 bytes Desc: not available URL: 
> <http://www.uib.no/mailman/public/corpora/attachments/20140219/9a68c358/attachment.txt> 
> ------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140220/f591dd3d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora