[Corpora-List] New homepage for the GENIA project and biomedical annotated corpora

Paul Thompson Paul.Thompson at manchester.ac.uk
Thu Dec 22 11:13:13 UTC 2011


We are pleased to announce a new website for the GENIA project: http://www.nactem.ac.uk/genia/

The GENIA project has been running since 1998, and the new website contains information about the following:

* The GENIA corpus - the primary resource created by the GENIA project. The corpus is intended to support the development and evaluation of information extraction and text mining systems for the domain of molecular biology. It consists of 1,999 MEDLINE abstracts, which have been annotated with various levels of linguistic and semantic information, i.e. parts-of-speech, syntax, terms, events, relations and coreference. The corpus can be downloaded from the website.

* Shared tasks - The GENIA project initiated the BioNLP Shared Task series and has organised a number of tasks in 3 different shared task events, i.e. the BioNLP/JNLPBA Shared Task 2004, and the BioNLP Shared Tasks of 2009 and 2011.

* Other GENIA project corpora - A number of additional corpora have been annotated using extensions of the GENIA/BioNLP Shared Task event representation. These consist of event corpora of protein post-translational modifications (PTM), Type IV secretion systems, DNA methylation, mTOR pathways and "Exhaustive PTM".

* Efforts that are related to the GENIA project. These include the meta-knowledge corpus - an extension of the GENIA event corpus which adds annotation about how events are to be interpreted according to their textual context.

Information about tools developed to perform automatic annotation, through training on the GENIA corpus, will be added to the site shortly.


--------

Paul Thompson
Research Associate
School of Computer Science
National Centre for Text Mining
Manchester Interdisciplinary Biocentre
University of Manchester
131 Princess Street
Manchester
M1 7DN
UK
Tel: 0161 306 3091
http://personalpages.manchester.ac.uk/staff/Paul.Thompson/





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111222/a0be0cd0/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list