[Corpora-List] Summary: resources for evaluating term extraction

Adam Kilgarriff adam at lexmasterclass.com
Mon Feb 24 10:16:28 UTC 2014


Dear all

here is a summary of responses to my request for resources for evaluating
term extraction.

1.  TTC project has prepared corpora and terms for 2 domains and seven
languages: see

http://www.lina.univ-nantes.fr/?Reference-Term-Lists-of-TTC.html

Thanks to Anne Schumann

2  ACL Anthology corpus has been marked up with "valid terms" and
"technology terms".
Thanks to Behrang Qasemizadeh

3. Georgeta Bordea says:
In our previous work [1] done in the context of the Saffron project [2] we
were interested in cross-domain evaluation of term extraction. Because we
did not find other datasets similar to GENIA we relied on datasets
annotated for keyphrase extraction [3] and index term assignment [4] which
are more abundant.

[1]
https://lipn.univ-paris13.fr/tia2013/Proceedings/actesTIA2013.pdf#page=59
[2] http://saffron.deri.ie/
[3] https://github.com/snkim/AutomaticKeyphraseExtraction
[4] http://code.google.com/p/maui-indexer/wiki/Resources


4. Kevin Cohen and Sophia Ananiadou pointed to resources related to the
Termine tool: however they did not include reference lists of 'gold
standard' terms.

5. Viktor Pekar pointed to a SemEval task which included "aspect term
extraction" in the domain of restaurant reviews, by which they mean
"service" and "staff" in the sentence "I liked the service and the staff".
 see http://alt.qcri.org/semeval2014/task4/  This wasn't quite what we were
looking for.

Thanks all

Adam

========original post====================
Date: Wed, 19 Feb 2014 11:34:36 +0000
Subject: [Corpora-List] Resources for evaluating term extraction

Dear all,

The Sketch Engine now supports term extraction for many languages - and we
want to evaluate it.

For that, we need domain corpora in which somebody has gone through
identifying all the 'true' terms.  Then we can compute our system's
precision and recall.

We are aware of GENIA, for English, and are using that already (key
citation here: A comparative evaluation of term recognition
algorithms. 2008: Z Zhang, J Iria, CA Brewster, F Ciravegna)

Any corpus with "the terms it contains", conscientiously produced, will
help us.

Pointers please!

Adam
-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

                        *DANTE: a lexical database for English
<http://www.webdante.com>                  *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140224/1914d83c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list