[Corpora-List] Resources for evaluating term extraction
Georgeta Bordea
georgeta.bordea at deri.org
Thu Feb 20 11:46:10 UTC 2014
Hi Adam,
In our previous work [1] done in the context of the Saffron project [2]
we were interested in cross-domain evaluation of term extraction.
Because we did not find other datasets similar to GENIA we relied on
datasets annotated for keyphrase extraction [3] and index term
assignment [4] which are more abundant.
Regards,
Georgeta
[1]
https://lipn.univ-paris13.fr/tia2013/Proceedings/actesTIA2013.pdf#page=59
[2] http://saffron.deri.ie/
[3] https://github.com/snkim/AutomaticKeyphraseExtraction
[4] http://code.google.com/p/maui-indexer/wiki/Resources
On 20/02/14 11:00, corpora-request at uib.no wrote:
> Message: 2
> Date: Wed, 19 Feb 2014 11:34:36 +0000
> From: Adam Kilgarriff<adam at lexmasterclass.com>
> Subject: [Corpora-List] Resources for evaluating term extraction
> To:"corpora at hd.uib.no" <corpora at hd.uib.no>
>
> Dear all,
>
> The Sketch Engine now supports term extraction for many languages - and we
> want to evaluate it.
>
> For that, we need domain corpora in which somebody has gone through
> identifying all the 'true' terms. Then we can compute our system's
> precision and recall.
>
> We are aware of GENIA, for English, and are using that already (key
> citation here: A comparative evaluation of term recognition
> algorithms<http://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=VsRwsN8AAAAJ&citation_for_view=VsRwsN8AAAAJ:u5HHmVD_uO8C>
> 2008: Z Zhang, J Iria, CA Brewster, F Ciravegna)
>
> Any corpus with "the terms it contains", conscientiously produced, will
> help us.
>
> Pointers please!
>
> Adam
>
> -- ======================================== Adam Kilgarriff
> <http://www.kilgarriff.co.uk/> adam at lexmasterclass.com Director
> Lexical Computing Ltd<http://www.sketchengine.co.uk/> Visiting
> Research Fellow University of Leeds<http://leeds.ac.uk> *Corpora for
> all* with the Sketch Engine <http://www.sketchengine.co.uk> *DANTE: a
> lexical database for English <http://www.webdante.com> *
> ======================================== -------------- next part
> -------------- A non-text attachment was scrubbed... Name: not
> available Type: text/html Size: 2293 bytes Desc: not available URL:
> <http://www.uib.no/mailman/public/corpora/attachments/20140219/9a68c358/attachment.txt>
> ------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140220/f591dd3d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list