31.684, Calls: Comp Ling/Spain

Mon Feb 17 20:21:25 UTC 2020

LINGUIST List: Vol-31-684. Mon Feb 17 2020. ISSN: 1069 - 4875.

Subject: 31.684, Calls: Comp Ling/Spain

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Lauren Perkins <lauren at linguistlist.org>
================================================================

Date: Mon, 17 Feb 2020 15:20:31
From: Marie Candito [marie.candito at gmail.com]
Subject: PARSEME shared task 1.2 on semi-supervised identification of verbal multiword expressions

Full Title: PARSEME shared task 1.2 on semi-supervised identification of verbal multiword expressions 
Short Title: PAERSEME ST 1.2 

Date: 13-Sep-2020 - 14-Sep-2020
Location: Barcelona, Spain 
Contact Person: Carlos Ramisch
Meeting Email: carlos.ramisch at lis-lab.fr
Web Site: http://multiword.sourceforge.net/sharedtask2020 

Linguistic Field(s): Computational Linguistics 

Call Deadline: 30-Apr-2020 

Meeting Description:

MWE-LEX 2020 will host edition 1.2 of the PARSEME shared task on
semi-supervised identification of verbal MWEs. This is a follow-up of editions
1.0 (2017), and 1.1 (2018). The latter covered 20 languages and received 17
submissions by 12 teams. Edition 1.2 will feature (a) improved and extended
corpora annotated with MWEs, (b) complementary unannotated corpora for
unsupervised MWE discovery, and (c) new evaluation metrics focusing on unseen
MWEs. Following the synergy with Elexis, our aim is to foster the development
of unsupervised methods for MWE lexicon induction, which in turn can be used
for identification. Authors may submit system description papers to a special
track, following common submission guidelines. Details will be available here
soon.

Second Call for Participation: 

The third edition of the PARSEME shared task on automatic identification of
verbal multiword expressions (VMWEs) aims at identifying verbal MWEs in
running texts. For this third edition, emphasis will be put on discovering
VMWEs that were not seen in the training corpus.

We kindly ask potential participant teams to register using the expression of
interest form:
https://docs.google.com/forms/d/e/1FAIpQLSfcmbd6MmKjFuBxCoaTWGCPGqoH5FoJ-th8IA
Zk3kh_ECDaZQ/viewform?usp=sf_link

Task updates and questions will be posted on 
http://multiword.sourceforge.net/sharedtask2020 and on our public mailing list
(anyone can join): http://groups.google.com/group/verbalmwe

Provided corpora: 
Corpora are being prepared for the following languages: Bulgarian (BG),
Croatian (HR), German (DE), Greek (EL), Basque (EU), French (FR), Irish (GA),
Hebrew (HE), Hindi (HI), Hungarian (HU), Italian (IT), Polish (PL), Brazilian
Portuguese (PT), Romanian (RO), Swedish (SV), Turkish (TR), Chinese (ZH).

For each language, we will release:

On March 18, 2020: 
* a training corpus and a development corpus, manually annotated for VMWEs.
The provided annotations follow the PARSEME 1.1 guidelines:
https://parsemefr.lis-lab.fr/parseme-st-guidelines/1.1/.
* a raw corpus, not annotated for VMWEs, to support semi- and unsupervised
methods for VMWEs discovery 

On April 28, 2020:
* A blind test corpus to be used as input to the systems during the evaluation
phase

Morphosyntactic annotations (parts of speech, lemmas, morphological features,
and syntactic dependencies) are also provided, both for annotated and raw
corpora.  Depending on the language, the information comes from treebanks
(mostly Universal Dependencies v2) or from automatic parsers trained on UD v2
treebanks (e.g., UDPipe).

A small **trial data set** is available on the shared task's release
repository: https://gitlab.com/parseme/sharedtask-data/-/tree/master/1.2/trial

Tracks: 
System results can be submitted in:
  * Closed track: Systems using only the provided data (training, dev, + raw
corpora)
  * Open track: Systems using or not the provided training corpus, plus any
additional resources deemed useful (lexicons, symbolic grammars, wordnets,
other raw corpora, embeddings and language models trained on external data,
etc.). 

In both tracks, the use of the corpora from the previous PARSEME shared tasks
is strictly forbidden.

Evaluation metrics: 
The evaluation metrics will be the same as for the 1.1 edition, as described
in:
http://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_04_LAW-MWE-C
xG_2018___lb__COLING__rb__&subpage=CONF_50_Evaluation_metrics

For the 1.2 edition the published general ranking will emphasize 3 metrics:
   * global MWE-based
   * global Token-based
   * unseen MWE-based

A VMWE from the test corpus is ''seen'' if a VMWE with the same (multi-)set of
lemmas is annotated at least once in the training corpus.

Corpus split: 
For each language, the annotated sentences will be shuffled and split, in a
way which ensures that there is a minimum of 300 VMWEs in the test set which
are unseen in the training + dev sets. 

Important dates: 
  * Feb 17, 2020: trial data and evaluation script released
  * Mar 18: training and development corpus + raw corpus released
  * Apr 28: blind test corpus released
  * Apr 30: submission of system results
  * May 06: announcement of results
  * May 20: shared task system description papers due (same as regular papers)
  * Jun 24: notification of acceptance
  * Jul 11: camera-ready system description papers due
  * Sep 14: shared task session at the MWE-LEX 2020 workshop at Coling 2020

Organizing team: 
Carlos Ramisch, Marie Candito, Bruno Guillaume, Agata Savary, Ashwini Vaidya,
and Jakub Waszczuk

Contact: parseme-st-core at nlp.ipipan.waw.pl

------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-31-684	
----------------------------------------------------------