Info: Data release, SPMRL 2014 Shared Task (Statistical Parsing of Morphologically-Rich Languages)

Thierry Hamon hamon at LIMSI.FR
Tue May 27 20:20:46 UTC 2014

Date: Mon, 26 May 2014 06:06:34 +0200
From: Djamé Seddah <djame.seddah at>
Message-Id: <CCCC9737-501E-4D5E-9A65-55FA7CEF8B83 at>

*** apologies for cross-posting *** 

============== SPMRL 2014 SHARED TASK ============== 
The first joint SPMRL-SANCL workshop (, collocated with
COLING 2014, hosts the second Shared Task on Parsing
Morphologically-Rich Languages.

===== Summary ===== 
Following the success of the first shared task, a second edition is
launched with this year an emphasis on the use of semi superivised
techniques. On top of treebanks data (phase structures and
dependencies), large annotated data set are made available.

test submission deadline : July 21, 2014 
(test data release: July 14, 2014)

===== Introduction =====
The primary goal of the First Shared Task on Parsing
Morphologically-Rich Languages was to bring forward work on parsing
morphologically ambiguous input in both dependency and constituency
parsing, and to show the state of the art for MRLs. In the longer term,
we aim to provide streamlined data sets and evaluation metrics, thus
improving the comparability of cross linguistic work on parsing MRLs.

The 2014 Shared Task edition will explicitly allow and favor the use of
large unlabeled data set. In order to properly evaluate the improvement
brought by the use of semi-supervized models, all annotated data and
evaluation process will remain the same.

The shared task features tracks in constituency parsing and in
dependency parsing, in gold as well as in realistic scenarios (the
realistic scenario has no gold tokenization, no gold part-of-speech tags
and morphological features).

===== Data set ===== 
The participants will be provided with data from 9 different languages
(Arabic, Basque, French, German, Hebrew, Hungarian, Korean, Polish,
Swedish). The data are available in Penn Treebank bracketing format,
CoNLL-X format and optionally in TiGerXML.
In order to ease cross-linguistic comparisons, the data set are also
released with a common training size setting (ie. 5000 sentences).

For all these treebanks, we provide unlabeled data set as well. To lower
the entry cost for new comers in the field, we also provide more than
accurate baseline, if not state-of-the-art, morpho-syntactic annotations
(POS tagged, morphological features, lemmas and multiword expressions if
available in the original treebank) and syntactic dependencies.

===== Shared Task Schedule ===== 
Release of training and dev. data                       May 26
Release of test data                                    July 14
Deadline for submission of test runs                    July 21
Submission andannouncement of results                   July 25
Shared task papers due (provisional)                    August 7 
Camera ready papers due                                 August 16

Shared Task papers will be published in the "Working notes of the SPMRL
2014 Shared Task".
Short system description with publication on the SPMRL-SANCL 2014
proceedings can optionnaly be submitted (deadline: July 7, camera ready:
July 16).

===== Shared task Organizers ===== 
- Djamé Seddah (Univ. Paris Sorbonne & INRIA’s Alpage Project, France) 
- Reut Tsarfaty (Weizmann Institute of Science, Israel) 
- Sandra Kübler (Indiana University, US)

===== SPMRL-SANCL 2014 Organizers ===== 
- Yoav Goldberg (Bar Ilan University, Israel) 
- Yuval Marton (Microsoft Corp., US) 
- Ines Rehbein (Potsdam University, Germany) 
- Yannick Versley (Heidelberg University, Germany) 
- Özlem Çetinoglu (University of Stuttgart, Germany) 
- Joel  Tetreault (Yahoo! Labs, US)

===== Contact details ===== 
- Mail: spmrl.sharedtask at 
- Webpage: 
- Wiki (Data set and FAQ) :
- mailing list:
SPMRL-SANCL 2014 website:

===== Endorsements ===== 
This shared-task is endorsed by THE ACL SIGPARSE interest group and
sponsored by the Inria’s Alpage project, Weizmann Institute of Science
and Indiana University.

For their precious help preparing the SPMRL 2014 Shared Task and for
allowing their data to be part of it, we warmly thank the Linguistic
Data Consortium, the Knowledge Center for Processing Hebrew (MILA), the
Ben Gurion University, Columbia University, Institute of Computer
Science (Polish Academy of Sciences), Korea Advanced Institute of
Science and Technology, University of the Basque Country, Uppsala
University, University of Gothenburg, University of Massachussets
Hamherst, University of Stuttgart, University of Szeged, University of
Heidelberg and Université Paris Diderot (Paris 7).
We are also very grateful to the Philosophical Faculty of the
Heinrich-Heine Universität Düsseldorf for hosting the shared task data
via their dokuwiki.

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

ATALA décline toute responsabilité concernant le contenu des
messages diffusés sur la liste LN

More information about the Ln mailing list