30.4418, Calls: Computational Linguistics/Spain

The LINGUIST List linguist at listserv.linguistlist.org
Thu Nov 21 09:59:53 UTC 2019


LINGUIST List: Vol-30-4418. Thu Nov 21 2019. ISSN: 1069 - 4875.

Subject: 30.4418, Calls: Computational Linguistics/Spain

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Thu, 21 Nov 2019 04:53:31
From: Agata Savary [agata.savary at univ-tours.fr]
Subject: PARSEME Shared Task 1.2 on Semi-supervised Identification of Verbal Multiword Expressions

 
Full Title: PARSEME shared task 1.2 on semi-supervised identification of verbal multiword expressions 
Short Title: PAERSEME ST 1.2 

Date: 13-Sep-2020 - 14-Sep-2020
Location: Barcelona, Spain 
Contact Person: Carlos Ramisch
Meeting Email: carlos.ramisch at lis-lab.fr
Web Site: http://multiword.sourceforge.net/sharedtask2020 

Linguistic Field(s): Computational Linguistics 

Call Deadline: 30-Apr-2020 

Meeting Description:

MWE-LEX 2020 will host edition 1.2 of the PARSEME shared task on
semi-supervised identification of verbal MWEs. This is a follow-up of editions
1.0 (2017), and 1.1 (2018). The latter covered 20 languages and received 17
submissions by 12 teams. Edition 1.2 will feature (a) improved and extended
corpora annotated with MWEs, (b) complementary unannotated corpora for
unsupervised MWE discovery, and (c) new evaluation metrics focusing on unseen
MWEs. Following the synergy with Elexis, our aim is to foster the development
of unsupervised methods for MWE lexicon induction, which in turn can be used
for identification. Authors may submit system description papers to a special
track, following common submission guidelines. Details will be available here
soon.


Call for Papers:

The third edition of the PARSEME shared task on automatic identification of
verbal multiword expressions (VMWEs) aims at identifying verbal MWEs in
running texts.  Verbal MWEs include, among others, idioms (to let the cat out
of the bag), light-verb constructions (to make a decision), verb-particle
constructions (to give up), multi-verb constructions (to make do) and
inherently reflexive verbs (s'évanouir 'to faint' in French).  Their
identification is a well-known challenge for NLP applications, due to their
complex characteristics including discontinuity, non-compositionality,
heterogeneity and syntactic variability.

Previous editions have shown that, while some systems reach high performance
(F1>0.7) for identifying VMWEs that were seen in training data, performance on
unseen VMWEs is very low. Hence for this third edition, **emphasis will be put
on discovering VMWEs that were not seen in the training data**.

We kindly ask potential participant teams to register using the expression of
interest form:
https://docs.google.com/forms/d/e/1FAIpQLSfcmbd6MmKjFuBxCoaTWGCPGqoH5FoJ-th8IA
Zk3kh_ECDaZQ/viewform?usp=sf_link

Task updates and questions will be posted on the shared task website:
http://multiword.sourceforge.net/sharedtask2020
and announced on our public mailing list:
http://groups.google.com/group/verbalmwe


Provided data:

For each language, we provide to the participants corpora in which VMWEs are
annotated according to the 1.1 shared task guidelines
(http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1).

On March 18th, we will release, for each language: 
- A training corpus manually annotated for VMWEs;
- A development corpus to tune/optimize the systems' parameters,
- A larger raw corpus, to favor semi- and unsupervised methods for VMWEs
discovery

On April 28th, we will release, for each language:
- A blind test corpus to be used as input to the systems during the evaluation
phase, during which the VMWE annotations will be kept secret.

When available, morphosyntactic data  (parts of speech, lemmas, morphological
features and/or syntactic dependencies) are also provided, both for annotated
and raw corpora.  Depending on the language, the information comes from
treebanks (e.g., Universal Dependencies) or from automatic parsers trained on
treebanks (e.g., UDPipe).

So far we plan to include data for the following languages:
Bulgarian (BG), German (DE), Greek (EL), Basque (EU), French (FR), Hebrew
(HE), Hindi (HI), Croatian (HR), Hungarian (HU), Polish (PL), Brazilian
Portuguese (PT), Romanian (RO), Swedish (SV).

The amount of annotated data depends on the language.

Tracks:

System results can be submitted in two tracks:
- Closed track: Systems using only the provided training and development data
(with VMWE and provided morpho-syntactic annotations) + provided raw corpora.
- Open track: Systems using or not the provided training data, plus any
additional resources deemed useful (MWE lexicons, symbolic grammars, wordnets,
other raw corpora, word embeddings and language models trained on external
data, etc.). However, the use of previous shared task editions' corpora is
strictly forbidden. This track includes notably purely symbolic and rule-based
systems.

Teams submitting systems in the open track will be requested to describe and
provide references to all resources used at submission time. Teams are
encouraged to favor freely available resources for better reproducibility of
their results.




------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-30-4418	
----------------------------------------------------------






More information about the LINGUIST mailing list