Corpora: Corpus frequencies for psycholinguistic experiments
Philip Resnik
resnik at umiacs.umd.edu
Thu Jul 13 19:32:52 UTC 2000
A colleague has asked me whether there are alternatives to the widely
used Francis and Kucera frequencies that could be used in controlling
for word frequency in a psycholinguistics experiment. Although there
are plenty of English corpora out there from which it's easy to
generate word counts, it occurs to me to wonder whether anyone else
has already addressed this issue.
It seems to me that the criteria for selecting a corpus to use as a
basis for word frequency data in a psycholinguistics setting would be
that it be (a) large, (b) either as unspecialized as possible or at
least "balanced" to whatever extent is possible. The latter might
arguably rule out most of the available corpora because they comprise
primarily newswire. Is this why F&K is still so widely used?
Again, let me emphasize that the question is whether or not there is
an alternative to F&K specifically for use in psycholinguistics,
e.g. controlling for frequency. I'd suggest that replies go to me
personally and I can post a summary if there's interest.
Philip
----------------------------------------------------------------
Philip Resnik, Assistant Professor
Department of Linguistics and Institute for Advanced Computer Studies
1401 Marie Mount Hall UMIACS phone: (301) 405-6760
University of Maryland Linguistics phone: (301) 405-8903
College Park, MD 20742 USA Fax : (301) 405-7104
http://umiacs.umd.edu/~resnik E-mail: resnik at umiacs.umd.edu
More information about the Corpora
mailing list