27.82, FYI: Call for Participation: EmpiriST Shared Task
The LINGUIST List via LINGUIST
linguist at listserv.linguistlist.org
Tue Jan 5 16:35:19 UTC 2016
LINGUIST List: Vol-27-82. Tue Jan 05 2016. ISSN: 1069 - 4875.
Subject: 27.82, FYI: Call for Participation: EmpiriST Shared Task
Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry, Sara Couture)
Homepage: http://linguistlist.org
***************** LINGUIST List Support *****************
25 years of LINGUIST List!
Please support the LL editors and operation with a donation at:
http://funddrive.linguistlist.org/donate/
Editor for this issue: Ashley Parker <ashley at linguistlist.org>
================================================================
Date: Tue, 05 Jan 2016 11:35:11
From: Kay-Michael Würzner [wuerzner at bbaw.de]
Subject: Call for Participation: EmpiriST Shared Task
Call for Participation: EmpiriST Shared Task on Processing German CMC/Social Media & Web Data
The EmpiriST 2015 shared task aims to encourage the developers of NLP applications to adapt their tools and resources for the processing of written German discourse in genres of computer-mediated communication (CMC) – such as chats, forums, wiki talk pages, tweets, blog comments, social networks, SMS and WhatsApp dialogues – as well as monological web pages – such as personal or professional blogs, Wikipedia articles, academic sites, etc.
The shared task is divided into two subtasks (A: tokenization, B: POS tagging) and two different data sets (CMC subset, web corpora subset). While our main goal is to foster the development of robust tools that work well on a wide range of CMC & web genres, teams are allowed to focus on one subtask or one subset only. Full manually annotated training data are available now on the EmpiriST homepage, comprising approx. 5000 tokens for each subset.
Results and system descriptions will be presented in the WAC-X workshop co-located with ACL 2016 in Berlin, Germany (11 or 12 August 2016).
For more information, including detailed annotation guidelines and instructions for participation, see the EmpiriST homepage at
https://sites.google.com/site/empirist2015/
and join our Google group for updates, questions and discussion:
https://groups.google.com/d/forum/empirist2015
While EmpiriST is focussed on the annotation of German-language data, familiarity with German is not essential for participating in the task. There are sufficient amounts of training data for general machine learning, domain adaptation and optimization approaches. We also provide an English summary of the POS tagset and annotation guidelines.
Schedule:
20.12.2015
Release of the training data
31.01.2016
Team registration
15.02.2016
Release of the evaluation data for the tokenization subtask
19.02.2016
Submission deadline for the tokenization subtask
22.02.2016
Release of the evaluation data for the POS-tagging subtask
26.02.2016
Submission deadline for the POS-tagging subtask
ca. April 2016
Submission of system description papers
11/12.08.2016
Presentation of systems and task results at WAC-X workshop (ACL 2016, Berlin)
Task Force:
CMC data set:
- Michael Beißwenger (Technische Universität Dortmund)
- Kay-Michael Würzner (Berlin-Brandenburgische Akademie der Wissenschaften)
Web corpora data set:
- Sabine Bartsch (Technische Universität Darmstadt)
- Stefan Evert (Universität Erlangen-Nürnberg)
Contact address:
empirist at collocations.de
Linguistic Field(s): Computational Linguistics
Subject Language(s): German (deu)
------------------------------------------------------------------------------
***************** LINGUIST List Support *****************
Please support the LL editors and operation with a donation at:
http://funddrive.linguistlist.org/donate/
----------------------------------------------------------
LINGUIST List: Vol-27-82
----------------------------------------------------------
More information about the LINGUIST
mailing list