Corpora: Frequency Information and Lexical Verb Subcategorisation for German

Sabine Schulte im Walde schulte at IMS.Uni-Stuttgart.DE
Fri Dec 7 09:52:02 UTC 2001


Dear list members,

we created frequency lists on word forms, word-tag pairs, lemma-tag
pairs, etc. for German. The lists are similar in content and style to
those from Adam Kilgariff for the BNC. In addition, we provide verb
subcategorisation information for German, such as frequency and
probability distributions over frames types. All data was obtained
from a lexicalised statistical grammar model, trained on 35 million
words of German newspaper data.

Examples for the lexical information are given on
 http://www.ims.uni-stuttgart.de/tcl/RESOURCES/German-Lexicon-en.html
The full data is freely available on request for non-commercial
purposes.

Regards,
Sabine Schulte im Walde.



More information about the Corpora mailing list