Corpora: Corpus frequencies for psycholinguistic experiments

Philip Resnik resnik at umiacs.umd.edu
Thu Jul 13 19:32:52 UTC 2000


A colleague has asked me whether there are alternatives to the widely
used Francis and Kucera frequencies that could be used in controlling
for word frequency in a psycholinguistics experiment.  Although there
are plenty of English corpora out there from which it's easy to
generate word counts, it occurs to me to wonder whether anyone else
has already addressed this issue.

It seems to me that the criteria for selecting a corpus to use as a
basis for word frequency data in a psycholinguistics setting would be
that it be (a) large, (b) either as unspecialized as possible or at
least "balanced" to whatever extent is possible.  The latter might
arguably rule out most of the available corpora because they comprise
primarily newswire.  Is this why F&K is still so widely used?

Again, let me emphasize that the question is whether or not there is
an alternative to F&K specifically for use in psycholinguistics,
e.g. controlling for frequency.  I'd suggest that replies go to me
personally and I can post a summary if there's interest.

  Philip
  ----------------------------------------------------------------
  Philip Resnik, Assistant Professor
  Department of Linguistics and Institute for Advanced Computer Studies

  1401 Marie Mount Hall            UMIACS phone: (301) 405-6760
  University of Maryland           Linguistics phone: (301) 405-8903
  College Park, MD 20742 USA	   Fax   : (301) 405-7104
  http://umiacs.umd.edu/~resnik	   E-mail: resnik at umiacs.umd.edu



More information about the Corpora mailing list