28.2406, Confs: Comp Ling/Spain

Wed May 31 16:10:49 UTC 2017

LINGUIST List: Vol-28-2406. Wed May 31 2017. ISSN: 1069 - 4875.

Subject: 28.2406, Confs: Comp Ling/Spain

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Sarah Robinson <srobinson at linguistlist.org>
================================================================

Date: Wed, 31 May 2017 12:10:42
From: Martin Krallinger [mkrallinger at cnio.es]
Subject: IberEval 2017 - BARR track (Biomedical Abbreviation Recognition and Resolution)

IberEval 2017 - BARR track (Biomedical Abbreviation Recognition and Resolution) 
Short Title: BARR 

Date: 19-Sep-2017 - 19-Sep-2017 
Location: Murcia, Spain 
Contact: Martin Krallinger 
Contact Email: mkrallinger at cnio.es 
Meeting URL: http://temu.inab.org/ 

Linguistic Field(s): Computational Linguistics 

Subject Language(s): Spanish (spa)

Meeting Description: 

Overview:

The recognition and resolution of abbreviations, acronyms and symbols is a
critical step for a number of tasks including named entity recognition (NER),
machine translation, information retrieval/indexing and document
categorization among others. Therefore the implementation and availability of
abbreviation recognition systems is of great practical impact for text mining
and language processing.

In case of domains such as biomedicine and clinical research, abbreviations
are particularly frequent, often referring to entities and concepts of
importance such as genes, diseases, symptoms, drugs/chemicals or treatments.
NER, relation extraction and clinical document coding systems usually need to
cope with recognizing correctly short forms or abbreviations.

Abbreviations can be regarded as a ShortForm (SF) that denotes a longer word
or phrase (LongForm, LF), typically its definition. Different strategies have
been tested to detect short forms in English biomedical texts (Torii et al.,
2007), using for instance alignment-based approaches, machine learning methods
or rule-based strategies and some manually annotated corpora do exist (e.g.
MEDSTRACT, Ab3P or BIOADI, see Islamaj Doğan et al., 2014). Far less effort
has been made to detect short form- long form pairs in text written in other
languages.

There is a growing number of biomedical and clinical documents written in
Spanish, such as medical literature, medical agency reports, patents and
particularly electronic health records. Moreover, according to some estimates
there are over 500 million Spanish speakers worldwide.

As part of the IBEREVAL 2017
(http://cabrillo.lsi.uned.es/nlp/IberEval-2017/index.php) initiative we have
proposed the Biomedical Abbreviation Recognition and Resolution (BARR) track
with the aim promoting the development and evaluation of biomedical
abbreviation identification systems.

For this track, participating teams have to detect mentions of pairs of Short
Forms and their corresponding Long Forms from medical article abstracts
written in Spanish. BARR track organizers provide a manually labeled training
set exhaustively tagged with Short Form-Long Form pairs (in addition to other
abbreviations).

This track is particularly interesting as some abstracts were manually
transcribed, resembling preprocessing characteristics also found in clinical
documents. In addition to the BARR manually annotated Gold Standard corpora,
the BARR document collection will be released, consisting in the largest
existing unified collection of medical article abstracts written in Spanish
distributed through a special agreement with the publisher Elsevier to
participating teams.

Important tentative dates:

- March 28th, 2017: Release of sample data
- May 19th, 2017: Release of training data subset1
- May 30th, 2017: Release of training data (full set)
- June 19th, 2017: Release of Testing Set
- June 24th, 2017: Submission of participant runs
- June 26th, 2017: Working notes submission due (short system description 3-5
pages)
- June 28th, 2017: Reviews of Working notes sent out to authors
- July 1st, 2017: Deadline to submit Camera ready revised Working notes
- September 19th, 2017: Workshop at SEPLN 2017

------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-28-2406	
----------------------------------------------------------