[Corpora-List] Resources for evaluating term extraction

Adam Kilgarriff adam at lexmasterclass.com
Wed Feb 19 11:34:36 UTC 2014


Dear all,

The Sketch Engine now supports term extraction for many languages - and we
want to evaluate it.

For that, we need domain corpora in which somebody has gone through
identifying all the 'true' terms.  Then we can compute our system's
precision and recall.

We are aware of GENIA, for English, and are using that already (key
citation here: A comparative evaluation of term recognition
algorithms<http://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=VsRwsN8AAAAJ&citation_for_view=VsRwsN8AAAAJ:u5HHmVD_uO8C>
 2008: Z Zhang, J Iria, CA Brewster, F Ciravegna)

Any corpus with "the terms it contains", conscientiously produced, will
help us.

Pointers please!

Adam

-- 
========================================
Adam Kilgarriff <http://www.kilgarriff.co.uk/>
adam at lexmasterclass.com
Director                                    Lexical Computing
Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow                 University of
Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

                        *DANTE: a lexical database for English
<http://www.webdante.com>                  *
========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140219/fe968abf/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list