29.1570, Calls: Computational Linguistics/Germany

The LINGUIST List linguist at listserv.linguistlist.org
Tue Apr 10 22:19:19 UTC 2018


LINGUIST List: Vol-29-1570. Tue Apr 10 2018. ISSN: 1069 - 4875.

Subject: 29.1570, Calls: Computational Linguistics/Germany

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Kenneth Steimel <ken at linguistlist.org>
================================================================


Date: Tue, 10 Apr 2018 18:18:51
From: Veronika Vincze [vinczev at inf.u-szeged.hu]
Subject: Shared Task on Automatic Identification of Verbal Multiword Expressions

 
Full Title: Shared Task on Automatic Identification of Verbal Multiword Expressions 

Date: 25-Aug-2018 - 26-Aug-2018
Location: Santa Fe, Germany 
Contact Person: Agata Savary
Meeting Email: agata.savary at univ-tours.fr
Web Site: http://multiword.sourceforge.net/sharedtask2018 

Linguistic Field(s): Computational Linguistics 

Call Deadline: 04-May-2018 

Meeting Description:

The second edition of the PARSEME shared task on automatic identification of
verbal multiword expressions (VMWEs) aims at identifying verbal MWEs in
running texts.  Verbal MWEs include, among others, idioms (*to let the cat out
of the bag*), light verb constructions (*to make a decision*), verb-particle
constructions (*to give up*), multi-verb constructions (*to make do*) and
inherently reflexive verbs (*se suicider* 'to suicide' in French).  Their
identification is a well-known challenge for NLP applications, due to their
complex characteristics including discontinuity, non-compositionality,
heterogeneity and syntactic variability.

The shared task is highly multilingual: PARSEME members have elaborated
annotation guidelines based on annotation experiments in about 20 languages
from several language families.  These guidelines take both universal and
language-specific phenomena into account.  We hope that this will boost the
development of language-independent and cross-lingual VMWE identification
systems.


2nd Call for Participation:

The shared task is highly multilingual: PARSEME members have elaborated
annotation guidelines based on annotation experiments in about 20 languages
from several language families.  These guidelines take both universal and
language-specific phenomena into account.  We hope that this will boost the
development of language-independent and cross-lingual VMWE identification
systems.

Participation:

Participation is open and free worldwide.

Publication and workshop:

Shared task participants will be invited to submit a system description paper
to a special track of the Joint Workshop on Linguistic Annotation, Multiword
Expressions and Constructions (LAW-MWE-CxG-2018) at COLING 2018, to be held on
August 25-26, 2018, in Santa Fe, New Mexico, USA:
http://multiword.sourceforge.net/lawmwecxg2018

Provided data:

For each language, we provide to the participants corpora in which VMWEs are
annotated according to universal guidelines:

- Manually annotated **training corpora**  made available to the participants
in advance, in order to allow them to train their systems.
- Manually annotated **development corpora** also made available in advance so
as to tune/optimize the systems' parameters.
- Raw (unannotated) **test corpora** to be used as input to the systems during
the evaluation phase. The VMWE annotations in this corpus will be kept secret.

The training and development sets are available at:
https://gitlab.com/parseme/sharedtask-data/tree/master/1.1

When available, morphosyntactic data  (parts of speech, lemmas, morphological
features and/or syntactic dependencies) are also provided.  Depending on the
language, the information comes from treebanks (e.g., Universal Dependencies)
or from automatic parsers trained on treebanks (e.g., UDPipe).

We have prepared corpora for the following languages: Bulgarian (BG), German
(DE), Greek (EL), English (EN), Spanish (ES), Farsi (FA), French (FR), Hindi
(HI), Croatian (HR), Hungarian (HU), Italian (IT), Lithuanian (LT), Polish
(PL), Brazilian Portuguese (PT), Romanian (RO), Slovene (SL), Turkish (TR).
The amount of annotated data depends on the language.
Release for the Basque and the Hebrew data has been postponed until about 11
April.

Important Dates:

April 4, 2018: shared task training data released
April 30, 2018: shared task blind test data released
May 4, 2018: submission of system results
May 11, 2018: announcement of results
May 25, 2018: submission of system description papers
June 20, 2018: notification
June 30, 2018: camera-ready papers
August 25-26, 2018: shared task workshop colocated with LAW-MWE-CxG-2018

Organizing team:

Silvio Ricardo Cordeiro, Carlos Ramisch, Agata Savary, Veronika Vincze

Contact: parseme-st-core at nlp.ipipan.waw.pl




------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-29-1570	
----------------------------------------------------------






More information about the LINGUIST mailing list