[Corpora-List] Available: German Named Entity Recognition resources
Sebastian Padó
pado at ims.uni-stuttgart.de
Thu Jun 24 09:29:16 UTC 2010
Dear all,
We are glad to announce two new resources for German Named Entity
Recognition that are freely available for research purposes.
The first resource is a German classifier for the CRF-based Stanford
NER system that has been trained on the German CoNLL 2003 dataset. It
distinguishes four classes of NEs: person, location, organization,
other. It includes features based on lexical clusters obtained from a
large (175M tokens) corpus of unlabelled German text, which improves
recall by up to 10%.
The second resource consists of two EUROPARL transcripts annotated
with Named Entities using the same scheme. The total size is about
110,000 tokens.
According to our evaluation, the classifier is currently among the
best NER systems for German.
Condition | Test set | Prec | Rec | F-1
---------------------------------------------------------
In-domain | (CoNLL 2003 testb)| 86.6 | 71.2 | 78.2
Out-of-domain | (EUROPARL) | 78.0 | 56.7 | 65.6
For more information and downloads, please visit
http://nlpado.de/~sebastian/ner_german.html
Sincerely,
Manaal Faruqui & Sebastian Pado
IMS, University of Stuttgart
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list