[Corpora-List] DDI corpus: An annotated corpus with pharmacological substances and drug=?windows-1252?Q?=96drug_?=interactions

Wed Oct 30 16:34:00 UTC 2013

*DDI corpus: An annotated corpus with pharmacological substances and
drug–drug interactions (**http://labda.inf.uc3m.es/ddicorpus**)
*
The management of drug–drug interactions (DDIs) is a critical issue
resulting from the overwhelming amount of information available on them.
Natural Language Processing (NLP) techniques can providean interesting way
to reduce the time spent by healthcare professionals on reviewing
biomedical literature. However, the shortage of annotated corpora for DDI
extraction is the main bottleneck in the development of NLP systems for
this area of Pharmacovigilance. So precisely for this reason, we are
pleased to announce that the DDI corpus, an annotated corpus with
pharmacological substances and drug-drug interactions (DDIs), is now
available at *http://labda.inf.uc3m.es/ddicorpus. *

The DDI corpus is made up of 792 texts selected from the DrugBank database
and other 233 Medline abstracts on the subject of DDIs. The corpus was
annotated with a total of 18,502 pharmacological substances and 5028 DDIs,
including both pharmacokinetic (PK) as well as pharmacodynamic (PD)
interactions. To date, the corpora annotated with DDIs have focused in PK
DDIs, but not in PD DDIs.

Annotation guidelines were developed by domain experts in order to ensure a
high-quality, reliable and accurate annotation of the corpus.
Pharmacological substances were classified according to four entity types:
drug (for generic drugs), brand (for trade drugs), group (for drug classes)
and drug_n (for active substances not approved for human use). DDIs were
also classified into four types: mechanism (for DDIs describing the way the
interaction occurs), effect (for DDIs describing the consequence of the
interaction), advice (for DDIs described by a recommendation or advice) and
int (for DDIs without any additional information). Inter-Annotator
Agreement (IAA) was measured to assess the consistency and quality of the
corpus. The agreement was almost perfect (Kappa up to 0.96 and generally
over 0.80), except for the DDIs in the MedLine database (0.55–0.72).

The DDI corpus was developed for the SemEval 2013-DDIExtraction 2013 task (
http://www.cs.york.ac.uk/semeval-2013/task9/), whose main goal was to
provide a common framework for the evaluation of information extraction
techniques applied to the recognition and classification of pharmacological
substances (DrugNER subtask) and the detection and classification of
drug-drug interactions (DDIExtraction subtask) from biomedical texts. The
DDI corpus is a valuable gold-standard for those research groups interested
in the recognition of pharmacological active substances, including drugs,
groups of drugs, toxins, etc. or those specifically working in the field of
DDI relation extraction.

The DDI corpus is divided into two datasets: training and test. The
training dataset is the same for both subtasks and contains gold-standard
annotations of pharmacological substances and their interactions. It
consists of 714 texts (572 from DrugBank and 142 MedLIne abstracts)
annotated with a total of 13029 pharmacological substances (13029 from
DrugBank and 1826 from MedLine) and 4037 DDIs (3805 from DrugBank and 232
from MedLine). The test dataset for the Drug NER subtask consists of 52
DrugBank texts (annotated with 303 pharmacological substances) and 58
MedLine abstracts (with 382 pharmacological substances). The test dataset
for the subtask of DDI extraction consists of 158 DrugBank Texts (annotated
with 889 DDIs) and 33 MedLine abstracts (with 95 DDIs).
We hope that the release of this dataset will encourage further research on
the DDI problem.

A detailed description of the DDI corpus and the DDIExtraction 2013 task
can be found in the following articles:

- María Herrero-Zazo, Isabel Segura-Bedmar, Paloma Martínez, Thierry
Declerck, The DDI corpus: An annotated corpus with pharmacological
substances and drug–drug interactions, Journal of Biomedical Informatics,
Volume 46, Issue 5, October 2013, Pages 914-920, ISSN 1532-0464,
http://dx.doi.org/10.1016/j.jbi.2013.07.011.)

- Isabel Segura-Bedmar, Paloma Martínez, María Herrero-Zazo. SemEval-2013
Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts
(DDIExtraction 2013). In Proceedings of the 7th International Workshop on
Semantic Evaluation (SemEval 2013).

*Contact info:
*Isabel Segura-Bedmar (isegura at inf.uc3m.es)

-- 
Isabel Segura Bedmar
Despacho 2.2.A.10, Telf: 91 624 99 88
Departamento de Informática, Universidad Carlos III de Madrid,
Laboratory for Advanced Database (LABDA)
http://labda.inf.uc3m.es/doku.php?id=en:inicio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131030/78408060/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora