[Corpora-List] GermEval 2014 Final Call

Chris Biemann biem at lt.informatik.tu-darmstadt.de
Mon Jul 28 12:48:27 UTC 2014


(sorry for cross-posting; please forward to anyone who might be interested)

FINAL CALL FOR PARTICIPATION AND SUBMISSIONS

GermEval 2014 Named Entity Recognition Shared Task for German
=============================================================
https://sites.google.com/site/germeval2014ner/

October 7 2014,
Co-located with KONVENS 2014, October 8-10, Hildesheim, Germany

*** NEW: Data has been updated and slight inconsistencies have been 
corrected ***

Description
------------
Named Entity Recognition (NER) has been shown useful for a wide range of 
NLP tasks from Information Extraction to Speech Processing.
For Semantic Web applications like entity linking, NER is a crucial 
preprocessing step.
Even though German is a relatively well-resourced language, NER for 
German has been challenging, both because capitalization is a less 
useful feature than in other languages, and because existing training 
data sets are encumbered by license problems. Therefore, no publicly 
available NER taggers for German exist that are free of usage 
restrictions and perform at high levels of accuracy.

The GermEval 2014 NER Shared Task is an event that makes available 
CC-licensed German data with NER annotation with the goal of 
significantly advancing the state of the art in German NER and to push 
the field of NER towards nested representations of named entities.

We invite all researchers and industry professionals to participate in 
the challenge and to demonstrate their capabilities of creating a Named 
Entity Recognition system for German. The systems will be evaluated on a 
manually created testset. Training data and development data will be 
provided. There are no restrictions regarding the type of NER system 
submissions, and no restrictions on the use of external data, background 
corpora, lexical resources etc.

GermanEval 2014 NER is associated with the KONVENS 2014 conference and 
will take place as a KONVENS workshop at Hildesheim in October 2014.


Task Setup
----------
The GermEval 2014 NER Shared Task builds on a new dataset with German 
Named Entity annotation [1] with the following properties:

- The data was sampled from German Wikipedia and News Corpora as a 
collection of citations.
- The dataset covers over 31,000 sentences corresponding to over 590,000 
tokens.
- The NER annotation uses the NoSta-D guidelines, which extend the 
Tübingen Treebank guidelines, using four main NER categories with 
sub-structure, and annotating embeddings among NEs such as [ORG FC 
Kickers [LOC Darmstadt]].

Data and Guidelines are available for download at 
https://sites.google.com/site/germeval2014ner/

We split the dataset [1] into training, development and test sets and 
provide the datasets in a tab-separated (TSV) format.
- Training Set
- Development Set
- Test Set (Available August 1, 2014 in unannotated form, from September 
1, 2014 in annotated form)

Further, we provide an evaluation script (adopted from the CoNLL 
competitions) assessing a given TSV file against a gold standard. 
Evaluation script and manual are also available for download at 
https://sites.google.com/site/germeval2014ner/ .


There is just one track -- Participants may use arbitrary knowledge 
sources to model the data.  Participants may submit up to three runs.

Submissions consist of a TSV file providing predictions for the test 
data and a paper of up to 4 pages (including references) describing the 
chosen approach and analyzing the performance. Papers should follow the 
KONVENS 2014 style files. The papers will be published online. We expect 
authors to present summaries of their systems at the KONVENS workshop.


Important Dates
---------------

March 1, 2014: Call for Participation; incl. training and development data
May 25, 2014: Second Call for Participation; incl. evaluation framework
July 25, 2014: Final Call for Participation
August 1-15, 2014 Availability of test data and submission of model results
August 15, 2014 Deadline for Shared Task description submissions
September 1, 2014 Notification of Acceptance and Shared Task Results
September 15, 2014 Deadline camera-ready papers
October 7, 2014 GemEval NER workshop @ Konvens
October 8 - 10, 2014 Konvens Main Conference


Organizers
----------
Chris Biemann
Language Technology, Technische Universität Darmstadt
biem(AT)cs.tu-darmstadt.de

Sebastian Padó
IMS, Stuttgart University
pado(AT)ims.uni-stuttgart.de



[1] D. Benikova, C. Biemann, M. Reznicek. NoSta-D Named Entity 
Annotation for German: Guidelines and Dataset. Proceedings of LREC 2014, 
Reykjavik, Iceland


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list