30.1370, FYI: GermEval 2019 Task 1 - 2nd Call for Participation
The LINGUIST List
linguist at listserv.linguistlist.org
Thu Mar 28 02:57:57 UTC 2019
LINGUIST List: Vol-30-1370. Wed Mar 27 2019. ISSN: 1069 - 4875.
Subject: 30.1370, FYI: GermEval 2019 Task 1 - 2nd Call for Participation
Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Please support the LL editors and operation with a donation at:
https://funddrive.linguistlist.org/donate/
Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================
Date: Wed, 27 Mar 2019 22:56:48
From: Steffen Remus [remus at informatik.uni-hamburg.de]
Subject: GermEval 2019 Task 1 - 2nd Call for Participation
GermEval 2019 Task 1 - Shared Task on hierarchical classification of German
Blurbs (short texts)
2nd Call for Participation:
We invite interested parties to participate in this shared task. Further
information can be found here:
https://competitions.codalab.org/competitions/21226.
Hierarchical multi-label classification (HMC) of blurbs is the task of
classifying multiple labels for short descriptive texts of books, where each
label is part of an underlying hierarchy of categories. The increasing amount
of available digital documents and the need for more and finer-grained
categories calls for new, more robust and sophisticated text classification
methods. Large datasets often incorporate a categorical hierarchy, which can
be used to organize information of documents on different levels of
specificity. Traditional multi-class text classification approaches are
thoroughly researched, however, with the increase of available data and the
necessity of more specific hierarchies and since traditional approaches fail
to generalize adequately, the need for more robust and sophisticated
classification methods increases.
With this task we aim to foster research within the HMC context. This task is
focusing on classifying German books into their respective hierarchically
structured writing genres using short advertisement texts (blurbs). The data
contains further meta information such as author, page number, release date,
etc.
Tasks:
This shared task consists of two subtask, described below. You can
participate in one of them, or in both.
- Subtask A: The task is to classify German books into one or multiple most
general writing genres. Therfore, it can be considered a multi-label
classification task. In total, there are 8 classes that can be assigned to a
book: Literatur & Unterhaltung, Ratgeber, Kinderbuch & Jugendbuch, Sachbuch,
Ganzheitliches Bewusstsein, Glaube & Ethik, Künste, Architektur & Garten.
- SubTask B: The second task targets hierarchical multi-label classification
into multiple writing genres. In addition to the very general writing genres,
additional genres of different specificity can be assigned to a book. In
total, there are 343 different classes that are hierarchically structured on
up to 4 levels.
Data:
The complete dataset for this task consists in total of 20,784 examples.
Sample data is provided in order familiarize with the data structure. 14,548
training samples have been released and can be downloaded after registering
for the shared tasks. We accept submissions for the validation set (2,079
samples) and publish a leaderboard via the codalab page. The final evaluation
of the task will take place in July 2019, for this the true labels for the
validation set will be provided as additional training data. More information
can be found on the task's webpage at:
https://competitions.codalab.org/competitions/21226
Important Dates:
- Jan 2019: Release of trial data
- Feb 01, 2019: Release of training data (train + validation)
- Jun 01, 2019: Release test data
- July 15, 2019: Final submission of test results
- July 31, 2019: Submission of description paper
- Aug, 2019: Workshop in Nürnberg/Erlangen, Germany at the Conference on
Natural Language Processing KONVENS 2019 (https://dgfs.de/de/cl/konvens.html)
Organizers:
The task is organized by Rami Aly, Steffen Remus and Chris Biemann, Language
Technology, Department of Informatics, Universität Hamburg.
https://www.inf.uni-hamburg.de/en/inst/ab/lt/home.html
GermEval:
GermEval is a series of shared task evaluation campaigns that focus on Natural
Language Processing for the German language. GermEval has been conducted four
times since 2014 in co-location with KONVENS/GSCL conferences.
Linguistic Field(s): Applied Linguistics
Computational Linguistics
Semantics
Subject Language(s): German (deu)
------------------------------------------------------------------------------
*************************** LINGUIST List Support ***************************
The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
to find out how to donate and check how your university, country or discipline
ranks in the fund drive challenges. Or go directly to the donation site:
https://iufoundation.fundly.com/the-linguist-list-2019
Let's make this a short fund drive!
Please feel free to share the link to our campaign:
https://funddrive.linguistlist.org/donate/
----------------------------------------------------------
LINGUIST List: Vol-30-1370
----------------------------------------------------------
More information about the LINGUIST
mailing list