[Corpora-List] 40474 split compounds from GermaNet freely available
Verena Henrich
verena.henrich at uni-tuebingen.de
Fri May 31 11:30:46 UTC 2013
Apologies for cross postings
------------------------
We are happy to announce the availability of 40474 German nominal
compounds from GermaNet release 8.0 that have been split into their
constituent parts, i.e., modifier and head. This dataset has been
constructed semi-automatically and all compound splits have been
manually post-corrected.
The list of split compounds is freely available for download at
http://www.sfs.uni-tuebingen.de/GermaNet/compounds.shtml
For many applications, it is helpful to have information about the parts
of the compound, as usually the semantic interpretation is based on the
meaning of its parts. What makes compound splitting for German a
challenging task is the fact that compounding, which is a very
productive word formation process in German, is not always simple string
concatenation. It often involves the presence of intervening linking
elements or the elision of word-final characters in the modifier
constituent of a compound.
For more information about GermaNet, please consult the project website:
http://www.sfs.uni-tuebingen.de/GermaNet/
--
Verena Henrich
Seminar für Sprachwissenschaft
Universität Tübingen
Wilhelmstr. 19 (Raum 2.24)
72074 Tübingen
Germany
http://www.verenahenrich.de
Tel.: +49 (0)7071 2977313
Fax: +49 (0)7071 295214
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list