[Corpora-List] 40474 split compounds from GermaNet freely available

Verena Henrich verena.henrich at uni-tuebingen.de
Fri May 31 11:30:46 UTC 2013


Apologies for cross postings
------------------------

We are happy to announce the availability of 40474 German nominal 
compounds from GermaNet release 8.0 that have been split into their 
constituent parts, i.e., modifier and head. This dataset has been 
constructed semi-automatically and all compound splits have been 
manually post-corrected.

The list of split compounds is freely available for download at
http://www.sfs.uni-tuebingen.de/GermaNet/compounds.shtml

For many applications, it is helpful to have information about the parts 
of the compound, as usually the semantic interpretation is based on the 
meaning of its parts. What makes compound splitting for German a 
challenging task is the fact that compounding, which is a very 
productive word formation process in German, is not always simple string 
concatenation. It often involves the presence of intervening linking 
elements or the elision of word-final characters in the modifier 
constituent of a compound.

For more information about GermaNet, please consult the project website: 
http://www.sfs.uni-tuebingen.de/GermaNet/

--
Verena Henrich
Seminar für Sprachwissenschaft
Universität Tübingen

Wilhelmstr. 19 (Raum 2.24)
72074 Tübingen
Germany

http://www.verenahenrich.de

Tel.: +49 (0)7071 2977313
Fax: +49 (0)7071 295214


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list