[Corpora-List] The DDI corpus

Isabel Segura isegura at inf.uc3m.es
Tue Nov 5 14:36:14 UTC 2013


*Sorry there is a mistake in my previous email*. I meant to say : "We are
pleased to announce that the DDI corpus, an annotated corpus with
pharmacological substances and drug-drug interactions, is NOW
available at *http://labda.inf.uc3m.es/ddicorpus
<http://labda.inf.uc3m.es/ddicorpus>. "*


On 4 November 2013 10:04, Isabel Segura <isegura at inf.uc3m.es> wrote:

> We are pleased to announce that the DDI corpus, an annotated corpus with
> pharmacological substances and drug-drug interactions, is not available at *http://labda.inf.uc3m.es/ddicorpus
> <http://labda.inf.uc3m.es/ddicorpus>. *
>
> The management of drug–drug interactions (DDIs) is a critical issue
> resulting from the overwhelming amount of information available on them.
> Natural Language Processing (NLP) techniques can providean interesting way
> to reduce the time spent by healthcare professionals on reviewing
> biomedical literature. However, the shortage of annotated corpora for DDI
> extraction is the main bottleneck in the development of NLP systems for
> this area of Pharmacovigilance. So precisely for this reason, we are
> pleased to announce that the DDI corpus, an annotated corpus with
> pharmacological substances and drug-drug interactions (DDIs), is now
> available at *http://labda.inf.uc3m.es/ddicorpus
> <http://labda.inf.uc3m.es/ddicorpus>. *
>
> The DDI corpus is made up of 792 texts selected from the DrugBank database
> and other 233 Medline abstracts on the subject of DDIs. The corpus was
> annotated with a total of 18,502 pharmacological substances and 5028 DDIs,
> including both pharmacokinetic (PK) as well as pharmacodynamic (PD)
> interactions. To date, the corpora annotated with DDIs have focused in PK
> DDIs, but not in PD DDIs.
>
> Annotation guidelines were developed by domain experts in order to ensure
> a high-quality, reliable and accurate annotation of the corpus.
> Pharmacological substances were classified according to four entity types:
> drug (for generic drugs), brand (for trade drugs), group (for drug classes)
> and drug_n (for active substances not approved for human use). DDIs were
> also classified into four types: mechanism (for DDIs describing the way the
> interaction occurs), effect (for DDIs describing the consequence of the
> interaction), advice (for DDIs described by a recommendation or advice) and
> int (for DDIs without any additional information). Inter-Annotator
> Agreement (IAA) was measured to assess the consistency and quality of the
> corpus. The agreement was almost perfect (Kappa up to 0.96 and generally
> over 0.80), except for the DDIs in the MedLine database (0.55–0.72).
>
> The DDI corpus was developed for the SemEval 2013-DDIExtraction 2013 task (
> http://www.cs.york.ac.uk/semeval-2013/task9/), whose main goal was to
> provide a common framework for the evaluation of information extraction
> techniques applied to the recognition and classification of pharmacological
> substances (DrugNER subtask) and the detection and classification of
> drug-drug interactions (DDIExtraction subtask) from biomedical texts. The
> DDI corpus is a valuable gold-standard for those research groups interested
> in the recognition of pharmacological active substances, including drugs,
> groups of drugs, toxins, etc. or those specifically working in the field of
> DDI relation extraction.
>
> The DDI corpus is divided into two datasets: training and test. The
> training dataset is the same for both subtasks and contains gold-standard
> annotations of pharmacological substances and their interactions. It
> consists of 714 texts (572 from DrugBank and 142 MedLIne abstracts)
> annotated with a total of 13029 pharmacological substances (13029 from
> DrugBank and 1826 from MedLine) and 4037 DDIs (3805 from DrugBank and 232
> from MedLine). The test dataset for the Drug NER subtask consists of 52
> DrugBank texts (annotated with 303 pharmacological substances) and 58
> MedLine abstracts (with 382 pharmacological substances). The test dataset
> for the subtask of DDI extraction consists of 158 DrugBank Texts (annotated
> with 889 DDIs) and 33 MedLine abstracts (with 95 DDIs).
> We hope that the release of this dataset will encourage further research
> on the DDI problem.
>
> A detailed description of the DDI corpus and the DDIExtraction 2013 task
> can be found in the following articles:
>
> - María Herrero-Zazo, Isabel Segura-Bedmar, Paloma Martínez, Thierry
> Declerck, The DDI corpus: An annotated corpus with pharmacological
> substances and drug–drug interactions, Journal of Biomedical Informatics,
> Volume 46, Issue 5, October 2013, Pages 914-920, ISSN 1532-0464,
> http://dx.doi.org/10.1016/j.jbi.2013.07.011.)
>
> - Isabel Segura-Bedmar, Paloma Martínez, María Herrero-Zazo. SemEval-2013
> Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts
> (DDIExtraction 2013). In Proceedings of the 7th International Workshop on
> Semantic Evaluation (SemEval 2013).
>
>
> *Contact info:*Isabel Segura-Bedmar (isegura at inf.uc3m.es)
>
>
> --
> Isabel Segura Bedmar
> Despacho 2.2.A.10, Telf: 91 624 99 88
> Departamento de Informática, Universidad Carlos III de Madrid,
> Laboratory for Advanced Database (LABDA)
> http://labda.inf.uc3m.es/doku.php?id=en:inicio
>
>


-- 
Isabel Segura Bedmar
Despacho 2.2.A.10, Telf: 91 624 99 88
Departamento de Informática, Universidad Carlos III de Madrid,
Laboratory for Advanced Database (LABDA)
http://labda.inf.uc3m.es/doku.php?id=en:inicio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131105/c72220d3/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list