[Corpora-List] [Data release] SPMRL 2014 Shared Task (Statistical Parsing of Morphologically-Rich Languages)
Djamé Seddah
djame.seddah at free.fr
Mon May 26 04:06:34 UTC 2014
*** apologies for cross-posting ***
============== SPMRL 2014 SHARED TASK ==============
The first joint SPMRL-SANCL workshop (www.spmrl.org),
collocated with COLING 2014, hosts the second Shared Task on
Parsing Morphologically-Rich Languages.
===== Summary =====
Following the success of the first shared task, a second edition
is launched with this year an emphasis on the use of semi superivised
techniques. On top of treebanks data (phase structures and
dependencies), large annotated data set are made available.
Webpage: http://www.spmrl.org/spmrl2014-sharedtask.html
test submission deadline : July 21, 2014
(test data release: July 14, 2014)
===== Introduction =====
The primary goal of the First Shared Task on Parsing Morphologically-Rich
Languages was to bring forward work on parsing morphologically
ambiguous input in both dependency and constituency parsing, and
to show the state of the art for MRLs. In the longer term, we aim
to provide streamlined data sets and evaluation metrics, thus
improving the comparability of cross linguistic work on parsing
MRLs.
The 2014 Shared Task edition will explicitly allow and favor the
use of large unlabeled data set. In order to properly evaluate the
improvement brought by the use of semi-supervized models, all
annotated data and evaluation process will remain the same.
The shared task features tracks in constituency parsing and in
dependency parsing, in gold as well as in realistic scenarios (the
realistic scenario has no gold tokenization, no gold part-of-speech
tags and morphological features).
===== Data set =====
The participants will be provided with data from 9 different languages
(Arabic, Basque, French, German, Hebrew, Hungarian, Korean,
Polish, Swedish). The data are available in Penn Treebank bracketing
format, CoNLL-X format and optionally in TiGerXML.
In order to ease cross-linguistic comparisons, the data set are
also released with a common training size setting (ie. 5000 sentences).
For all these treebanks, we provide unlabeled data set as well. To
lower the entry cost for new comers in the field, we also provide
more than accurate baseline, if not state-of-the-art, morpho-syntactic
annotations (POS tagged, morphological features, lemmas and multiword
expressions if available in the original treebank) and syntactic
dependencies.
===== Shared Task Schedule =====
Release of training and dev. data May 26
Release of test data July 14
Deadline for submission of test runs July 21
Submission andannouncement of results July 25
Shared task papers due (provisional) August 7
Camera ready papers due August 16
Shared Task papers will be published in the "Working notes of the
SPMRL 2014 Shared Task".
Short system description with publication on the SPMRL-SANCL 2014
proceedings can optionnaly be submitted (deadline: July 7,
camera ready: July 16).
===== Shared task Organizers =====
- Djamé Seddah (Univ. Paris Sorbonne & INRIA’s Alpage Project, France)
- Reut Tsarfaty (Weizmann Institute of Science, Israel)
- Sandra Kübler (Indiana University, US)
===== SPMRL-SANCL 2014 Organizers =====
- Yoav Goldberg (Bar Ilan University, Israel)
- Yuval Marton (Microsoft Corp., US)
- Ines Rehbein (Potsdam University, Germany)
- Yannick Versley (Heidelberg University, Germany)
- Özlem Çetinoglu (University of Stuttgart, Germany)
- Joel Tetreault (Yahoo! Labs, US)
===== Contact details =====
-Mail: spmrl.sharedtask at gmail.com
-Webpage: http://www.spmrl.org/spmrl2014-sharedtask.html
-Wiki (Data set and FAQ) : http://dokufarm.phil.hhu.de/spmrl2014/doku.php
-mailing list: https://sympa.inria.fr/sympa/arc/mrlp-sharedtask/2013-06/
SPMRL-SANCL 2014 website: http://www.spmrl.org
===== Endorsements =====
This shared-task is endorsed by THE ACL SIGPARSE interest group
and sponsored by the Inria’s Alpage project, Weizmann Institute
of Science and Indiana University.
For their precious help preparing the SPMRL 2014 Shared Task and
for allowing their data to be part of it, we warmly thank the
Linguistic Data Consortium, the Knowledge Center for Processing
Hebrew (MILA), the Ben Gurion University, Columbia University,
Institute of Computer Science (Polish Academy of Sciences), Korea
Advanced Institute of Science and Technology, University of the
Basque Country, Uppsala University, University of Gothenburg, University
of Massachussets Hamherst, University of Stuttgart, University of Szeged,
University of Heidelberg and Université Paris Diderot (Paris 7).
We are also very grateful to the Philosophical Faculty of the Heinrich-Heine
Universität Düsseldorf for hosting the shared task data via their dokuwiki.
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list