[Corpora-List] Summary: resources for evaluating term extraction
Adam Kilgarriff
adam at lexmasterclass.com
Mon Feb 24 10:29:09 UTC 2014
Apologies - I missed out
6. María José Marín Pérez has created a corpus of legal English (BLARC)
and has used it for extensive term-extraction
experiments, and can provide both the corpus and the lists of terms (ppaer
submitted to COLING)
Adam
On 24 February 2014 10:16, Adam Kilgarriff <adam at lexmasterclass.com> wrote:
> Dear all
>
> here is a summary of responses to my request for resources for evaluating
> term extraction.
>
> 1. TTC project has prepared corpora and terms for 2 domains and seven
> languages: see
>
> http://www.lina.univ-nantes.fr/?Reference-Term-Lists-of-TTC.html
>
> Thanks to Anne Schumann
>
> 2 ACL Anthology corpus has been marked up with "valid terms" and
> "technology terms".
> Thanks to Behrang Qasemizadeh
>
> 3. Georgeta Bordea says:
> In our previous work [1] done in the context of the Saffron project [2] we
> were interested in cross-domain evaluation of term extraction. Because we
> did not find other datasets similar to GENIA we relied on datasets
> annotated for keyphrase extraction [3] and index term assignment [4]
> which are more abundant.
>
> [1]
> https://lipn.univ-paris13.fr/tia2013/Proceedings/actesTIA2013.pdf#page=59
> [2] http://saffron.deri.ie/
> [3] https://github.com/snkim/AutomaticKeyphraseExtraction
> [4] http://code.google.com/p/maui-indexer/wiki/Resources
>
>
> 4. Kevin Cohen and Sophia Ananiadou pointed to resources related to the
> Termine tool: however they did not include reference lists of 'gold
> standard' terms.
>
> 5. Viktor Pekar pointed to a SemEval task which included "aspect term
> extraction" in the domain of restaurant reviews, by which they mean
> "service" and "staff" in the sentence "I liked the service and the staff".
> see http://alt.qcri.org/semeval2014/task4/ This wasn't quite what we
> were looking for.
>
> Thanks all
>
> Adam
>
> ========original post====================
> Date: Wed, 19 Feb 2014 11:34:36 +0000
> Subject: [Corpora-List] Resources for evaluating term extraction
>
> Dear all,
>
> The Sketch Engine now supports term extraction for many languages - and we
> want to evaluate it.
>
> For that, we need domain corpora in which somebody has gone through
> identifying all the 'true' terms. Then we can compute our system's
> precision and recall.
>
> We are aware of GENIA, for English, and are using that already (key
> citation here: A comparative evaluation of term recognition
> algorithms. 2008: Z Zhang, J Iria, CA Brewster, F Ciravegna)
>
> Any corpus with "the terms it contains", conscientiously produced, will
> help us.
>
> Pointers please!
>
> Adam
> --
> ========================================
> Adam Kilgarriff <http://www.kilgarriff.co.uk/>
> adam at lexmasterclass.com
> Director Lexical Computing Ltd<http://www.sketchengine.co.uk/>
>
> Visiting Research Fellow University of Leeds<http://leeds.ac.uk>
>
> *Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>
>
> *DANTE: a lexical database for English
> <http://www.webdante.com> *
> ========================================
>
--
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director Lexical Computing
Ltd<http://www.sketchengine.co.uk/>
Visiting Research Fellow University of
Leeds<http://leeds.ac.uk>
*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>
*DANTE: a lexical database for English
<http://www.webdante.com> *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140224/93f0527b/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list