Corpora: Frequency Information and Lexical Verb Subcategorisation for German
Sabine Schulte im Walde
schulte at IMS.Uni-Stuttgart.DE
Fri Dec 7 09:52:02 UTC 2001
Dear list members,
we created frequency lists on word forms, word-tag pairs, lemma-tag
pairs, etc. for German. The lists are similar in content and style to
those from Adam Kilgariff for the BNC. In addition, we provide verb
subcategorisation information for German, such as frequency and
probability distributions over frames types. All data was obtained
from a lexicalised statistical grammar model, trained on 35 million
words of German newspaper data.
Examples for the lexical information are given on
http://www.ims.uni-stuttgart.de/tcl/RESOURCES/German-Lexicon-en.html
The full data is freely available on request for non-commercial
purposes.
Regards,
Sabine Schulte im Walde.
More information about the Corpora
mailing list