[Corpora-List] Frequency of masc./fem/neut. in German
Andras Kornai
andras at kornai.com
Fri Apr 17 15:54:11 UTC 2009
On Wed, Apr 15, 2009 at 10:35:13AM -0700, Dan I. Slobin wrote:
>
> How does this count treat noun compounds? E.g., das Werk, der
> Werkfuehrer, die Werkstatt... / die Kammer, das Kammerwasser, der
> Kammerbeamter...
Dan,
to the extent compounds inherit their gender from their head it is
extremely unlikely that the overall numbers would change much, this
would require some special effect that impacts the productivity of
masc fem or neut bases differentially. You can observe the same broad
tendency, neuters contributing only about 15%, the rest being fem and
masc distributed about equally, by simply counting die der and das
in running text. In 10.1m words from Project Gutenberg (typically 19th
c. or earlier material) you find
242894 die
238893 der
106332 das
and similarly for 1990s newspaper text (8.4m words of Der Spiegel)
284777 die
265051 der
86214 das
Given that such numbers are easily swayed by style -- compare a 14.9m
word sample from Frankfurter Rundschau from the same year that has
501637 der
497189 die
143069 das
and the fact that plurals would favor die over der, the numbers are
largely consistent with Sven's findings (but are obtained with far
less work).
> Here are some type counts based on noun readings (and not noun
> lemmas)
> in two computational lexica for German,
> ignoring readings with more than 1 possible gender:
> fem masc neut
> HaGenLex 6409 4702 1723
> CELEX+HaGenLex 23311 15846 10064
Altogether, the effect of usage (masculine nouns seem to be used more
frequently than their frequency among stems would dictate) appear to
be considerably greater than the effects of compounding, but this is
just a rough order of magnitude impression, it would take quite a bit
of work to unravel the impact of these factors across genres and
styles.
Andras Kornai
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list