[Corpora-List] spotting names of drugs

Martin Krallinger mkrallinger at cnio.es
Tue Jan 8 16:07:36 UTC 2013


Dear Pete Whitelock,

we are actually organizing an named entity challenge related precisely to
your question, the
CHEMDNER task of BioCreative IV (I have added the CFP below).

There are several approaches to used for the identification of drug names
and chemical compounds,
some based on dictionaries, other based on rules or machine learning
methods. A review summarizing the
used methods is:

Vazquez, M., Krallinger, M., Leitner, F., & Valencia, A. (2011). Text
Mining for Drugs and Chemical Compounds: Methods, Tools and Applications.
Molecular Informatics, 30(6‐7), 506-519.

Regarding a dictionary for drug names, there are several databases that
have lists of names (DrugBank,
CHEBI, PubChem,... ). A sort of more integrated resource is JOCHEM (
http://www.biosemantics.org/index.php?page=Jochem)

Regards,


Martin Krallinger



--------------------------------------------

CALL FOR PARTICIPATION: CHEMDNER task: Chemical compound and drug name
recognition task.

(http://www.biocreative.org/tasks/biocreative-iv/chemdner)

TASK GOAL AND MOTIVATION
------------------------
The goal of this task is to promote the implementation of systems that are
able to detect mentions in text of chemical compounds and drugs. The
recognition of chemical entities is also crucial for other subsequent text
processing strategies, such as detection of drug-protein interactions,
adverse effects of chemical compounds or the extraction of pathway  and
metabolic reaction relations. A range of different methods have been
explored for the
recognition of chemical compound mentions including machine learning based
approaches, rule-based systems and different types of dictionary-lookup
strategies.

We foresee a considerable interest in the result of this task by the
NLP/text mining community on one side, as well as by the bioinformatics,
drug discovery/biomedicine and chemoinformatics communities on the other
side. As has been the case in previous BioCreative efforts (resulting in
high impact papers in the field), we expect that successful participants
will have the opportunity to publish their system descriptions in a journal
article.

CHEMDNER Track description.
------------------------------
The CHEMDNER is one of the tracks posed at the BioCreative IV community
challenge (http://www.biocreative.org).

We invite participants to submit results for the CHEMDNER task providing
predictions for one or both of the following subtasks:

a) Given a set of documents, return for each of them a ranked list of
chemical entities described within each of these documents [Chemical
document indexing sub-task]

b) Provide for a given document the start and end indices corresponding to
all the chemical entities mentioned in this document [Chemical entity
mention recognition sub-task].

For these two tasks the organizers will release training and test data
collections. The task organizers will provide details on the used
annotation guidelines; define a list of criteria for relevant chemical
compound entity types as well as selection of documents for annotation.

REGISTRATION
------------
Teams can participate in the CHEMDNER task by registering for track 2 of
BioCreative IV. You can register additionally for other tracks too. To
register your team go to the following page that provides more detailed
instructions: http://www.biocreative.org/news/biocreative-iv/team/

Mailing list and contact information
You can post questions related to the CHEMDNER task to the BioCreative
mailing list. To register for the BioCreative mailing list, please visit
the following page: http://biocreative.sourceforge.net/mailing.html

WORKSHOP
--------
CHEMDNER is part of the BioCreative evaluation effort. The BioCreative
Organizing Committee will host the BioCreative IV Challenge evaluation
workshop (http://www.biocreative.org/events/biocreative-iv/CFP/) at NCBI,
National Institutes of
Health, Bethesda, Maryland, on October 7-9, 2013


CHEMDNER TASK ORGANIZERS
-------------------------
Martin Krallinger, Spanish National Cancer Research Center (CNIO)
Obdulia Rabal, University of Navarra, Spain
Julen Oyarzabal, University of Navarra, Spain
Alfonso Valencia, Spanish National Cancer Research Center (CNIO)

REFERENCES
----------
- Vazquez, M., Krallinger, M., Leitner, F., & Valencia, A. (2011). Text
Mining for Drugs and Chemical Compounds: Methods, Tools and Applications.
Molecular Informatics, 30(6‐7), 506-519.
- Corbett, P., Batchelor, C., & Teufel, S. (2007). Annotation of chemical
named entities. BioNLP 2007: Biological, translational, and clinical
language processing, 57-64.
- Klinger, R., Kolářik, C., Fluck, J., Hofmann-Apitius, M., & Friedrich, C.
M. (2008). Detection of IUPAC and IUPAC-like chemical names.
Bioinformatics, 24(13), i268-i276.
- Hettne, K. M., Stierum, R. H., Schuemie, M. J., Hendriksen, P. J.,
Schijvenaars, B. J., Mulligen, E. M. V., ... & Kors, J. A. (2009). A
dictionary to identify small molecules and drugs in free text.
Bioinformatics, 25(22), 2983-2991.
- Yeh, A., Morgan, A., Colosimo, M., & Hirschman, L. (2005). BioCreAtIvE
task 1A: gene mention finding evaluation. BMC bioinformatics, 6(Suppl 1),
S2.
- Smith, L., Tanabe, L. K., Ando, R. J., Kuo, C. J., Chung, I. F., Hsu, C.
N., ... & Wilbur, W. J. (2008). Overview of BioCreative II gene mention
recognition. Genome Biology, 9(Suppl 2), S2.



On Tue, Jan 8, 2013 at 4:45 PM, WHITELOCK, Pete <pete.whitelock at oup.com>wrote:

> I’m interested in the problem of spotting that a particular string that’s
> not in one’s dictionary is in fact the name of a drug. New drugs and their
> names are being created all the time and it’s pretty easy as a human to see
> a string in isolation and see “yeh, that’s a drug name”. Anyone done
> anything similar to this? I vaguely recall some discussion of
> distinguishing boys’ and girls’ names (as an exercise in some textbook?).*
> ***
>
> ** **
>
> In addition, does anyone know where to get a list of drug names to use as
> the starting point. ****
>
> ** **
>
> Thanks for any help****
>
> ** **
>
> Pete Whitelock, PhD
> Principal Language Engineer, Technology****
>
> Academic Dictionaries
> Oxford University Press****
>
> ** **
>
> Oxford University Press (UK) Disclaimer
>
> This message is confidential. You should not copy it or disclose its
> contents to anyone. You may use and apply the information for the intended
> purpose only. OUP does not accept legal responsibility for the contents of
> this message. Any views or opinions presented are those of the author only
> and not of OUP. If this email has come to you in error, please delete it,
> along with any attachments. Please note that OUP may intercept incoming and
> outgoing email communications.
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130108/919da076/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list