[Corpora-List] [Data release] SPMRL 2014 Shared Task (Statistical Parsing of Morphologically-Rich Languages)

Mon May 26 04:06:34 UTC 2014

*** apologies for cross-posting *** 

============== SPMRL 2014 SHARED TASK ============== 
The first joint SPMRL-SANCL workshop (www.spmrl.org), 
collocated with COLING 2014, hosts the second Shared Task on 
Parsing  Morphologically-Rich Languages.

===== Summary ===== 
Following the success of the first shared task, a second edition
is launched with this year an emphasis on the use of semi superivised
techniques. On top of treebanks data (phase structures and
dependencies), large annotated data set are made available.

Webpage: http://www.spmrl.org/spmrl2014-sharedtask.html
test submission deadline : July 21, 2014 
(test data release: July 14, 2014)

===== Introduction =====
The primary goal of the First Shared Task on Parsing Morphologically-Rich
Languages was to bring forward work on parsing morphologically
ambiguous input in both dependency and constituency parsing, and
to show the state of the art for MRLs. In the longer term, we aim
to provide streamlined data sets and evaluation metrics, thus
improving the comparability of cross linguistic work on parsing
MRLs.

The 2014 Shared Task edition will explicitly allow and favor the
use of large unlabeled data set. In order to properly evaluate the
improvement brought by the use of semi-supervized models, all
annotated data and evaluation process will remain the same.

The shared task features tracks in constituency parsing and in
dependency parsing, in gold as well as in realistic scenarios (the
realistic scenario has no gold tokenization, no gold part-of-speech
tags and morphological features).

===== Data set ===== 
The participants will be provided with data from 9 different languages 
(Arabic, Basque, French, German, Hebrew, Hungarian, Korean, 
Polish, Swedish). The data are available in Penn Treebank bracketing 
format, CoNLL-X format and optionally in TiGerXML.
In order to ease cross-linguistic comparisons, the data set are
also released with a common training size setting (ie. 5000 sentences).

For all these treebanks, we provide unlabeled data set as well. To
lower the entry cost for new comers in the field, we also provide
more than accurate baseline, if not state-of-the-art, morpho-syntactic
annotations (POS tagged, morphological features, lemmas and multiword
expressions if available in the original treebank) and syntactic
dependencies.

===== Shared Task Schedule ===== 
Release of training and dev. data 			May  26 
Release of test data						July 14 
Deadline for submission of test runs    		July 21 
Submission andannouncement of results  	July 25 
Shared task papers due (provisional)		August 7 
Camera ready papers due					August 16

Shared Task papers will be published in the "Working notes of the
SPMRL 2014 Shared Task".  
Short system description with publication on the SPMRL-SANCL 2014 
proceedings can optionnaly be submitted (deadline: July 7, 
camera ready: July 16).

===== Shared task Organizers ===== 
- Djamé Seddah (Univ. Paris Sorbonne & INRIA’s Alpage Project, France) 
- Reut Tsarfaty (Weizmann Institute of Science, Israel) 
- Sandra Kübler (Indiana University, US)

===== SPMRL-SANCL 2014 Organizers ===== 
- Yoav Goldberg (Bar Ilan University, Israel) 
- Yuval Marton (Microsoft Corp., US) 
- Ines Rehbein (Potsdam University, Germany) 
- Yannick Versley (Heidelberg University, Germany) 
- Özlem Çetinoglu (University of Stuttgart, Germany) 
- Joel  Tetreault (Yahoo! Labs, US)

===== Contact details ===== 
-Mail: spmrl.sharedtask at gmail.com 
-Webpage:  http://www.spmrl.org/spmrl2014-sharedtask.html 
-Wiki (Data set and FAQ) :  http://dokufarm.phil.hhu.de/spmrl2014/doku.php
-mailing list:  https://sympa.inria.fr/sympa/arc/mrlp-sharedtask/2013-06/
SPMRL-SANCL 2014 website: http://www.spmrl.org

===== Endorsements ===== 
This shared-task is endorsed by THE ACL SIGPARSE interest group 
and sponsored by the Inria’s Alpage project, Weizmann Institute 
of Science and Indiana University.

For their precious help preparing the SPMRL 2014 Shared Task and
for allowing their data to be part of it, we warmly thank the
Linguistic Data Consortium, the Knowledge Center for Processing
Hebrew (MILA), the Ben Gurion University, Columbia University,
Institute of Computer Science (Polish Academy of Sciences), Korea
Advanced Institute of Science and Technology, University of the
Basque Country, Uppsala University, University of Gothenburg, University 
of Massachussets Hamherst, University of Stuttgart, University of Szeged, 
University of Heidelberg and Université Paris Diderot (Paris 7). 
We are also very grateful to the Philosophical Faculty of the Heinrich-Heine 
Universität Düsseldorf for hosting the shared task data via their dokuwiki.

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora