26.4281, FYI: Release of 2.0 Genitive Database for German

The LINGUIST List via LINGUIST linguist at listserv.linguistlist.org
Wed Sep 30 15:57:37 UTC 2015


LINGUIST List: Vol-26-4281. Wed Sep 30 2015. ISSN: 1069 - 4875.

Subject: 26.4281, FYI:  Release of 2.0 Genitive Database for German

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry, Sara Couture)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
              http://funddrive.linguistlist.org/donate/

Editor for this issue: Ashley Parker <ashley at linguistlist.org>
================================================================


Date: Wed, 30 Sep 2015 11:56:56
From: Roman Schneider [schneider at ids-mannheim.de]
Subject: Release of 2.0 Genitive Database for German

 Release of 2.0 of GenitivDB - Database for German Genitive Classification 

We are pleased to announce release 2.0 of GenitivDB - the first
database for German Genitive Classification. It is available for public access online; the underlying dataset can be downloaded for scientific purposes.

GenitivDB is a novel NLP resource for the explanation of linguistic phenomena, built and evaluated exploring very large annotated language corpora. It can be used for the notoriously controversial classification and prediction of German genitive endings (short endings, long endings, zero-marker).

For its compilation, we used the DeReKo Reference Corpus, which is the largest linguistic resource worldwide for the study of written German. The corpus data served as a basis to extract all relevant genitive forms. After several refinements, the resulting collection comprises 650,726 types and 9,541,753 tokens. All findings are enriched with linguistic metadata (morphosyntactic information, phonetic and prosodic
data, context information, etc.) as well as extra-linguistic metadata (year of publication, country/region of origin, media type, thematical domain. etc.), for a total of more than 80 different metadata types.

New features of the GenitivDB 2.0 Dataset are:

- toponym identification as additional metadata type
- improved identification of proper nouns
- improved identification of time expressions
- adjusted score points (genitive probability value)
- various minor corrections (assignment of genitive endings, handling of zero-markers, etc.)

New features of the GenitivDB 2.0 online form are:

- additional search options (metadata types)
- computation of data distribution
- statistical exploration and visualization via R-based statistics tool

Online access and download:
http://www.ids-mannheim.de/genitivdb/

Citation: 
Bubenhofer, Noah / Hansen, Sandra / Konopka, Marek / Schneider, Roman (2015):
GenitivDB 2.0 - Datenbank zur Genitivmarkierung (Release vom
01.09.2015). Mannheim: Institut für Deutsche Sprache.
http://www.ids-mannheim.de/genitivdb

Please tell us whenever you publish work based on GenitivDB: grammis at ids-mannheim.de
 
Linguistic Field(s): Applied Linguistics
                     Computational Linguistics
                     Linguistic Theories
                     Syntax
                     Text/Corpus Linguistics

Subject Language(s): German (deu)



----------------------------------------------------------
LINGUIST List: Vol-26-4281	
----------------------------------------------------------







More information about the LINGUIST mailing list