Corpora: BNC word Frequency List
Paul Rayson
paul at comp.lancs.ac.uk
Wed Oct 18 12:55:04 UTC 2000
Neil,
> I remember reading some time ago that a word frequency list for the BNC had
> been produced.
>
> Could anybody tell me how to get hold of this?
There was a summary posted by Philip Resnik in July, part of which
follows.
Regards,
Paul.
1a. British National Corpus (http://info.ox.ac.uk/bnc/)
The corpus itself is available only to Europeans, but Adam
Kilgarriff has produced word frequency lists and put them on the
Web at http://www.itri.brighton.ac.uk/~Adam.Kilgarriff/bnc-readme.html.
He writes, "the lists from the BNC on my web page - particularly
the lemmatised ones - were produced with English teaching and
dictionaries in mind, and have been quite widely used for
experiment-type purposes. The BNC is clearly appropriate, as it
was designed with 'general English' in mind. (though it is
British, but I suspect the differences there are quite marginal.)
It's been getting 200 files downloaded per month for 4 years now,
and I think it is quite widely used."
Adam's paper
@article{ak-ijl,
author = "Adam Kilgarriff", title = "Putting Frequencies into
the Dictionary", journal = "International Journal of
Lexicography", year = 1997, volume = 10, number = 2, pages =
{135--155}
}
argues for the list and explains how it was done, and there's an
on-line copy available from his Web page.
Paul Rayson has been working on BNC and writes:
I have been working on frequency lists for the second version of
the BNC (POS tagging and file headers updated) and short versions
of those lists will appear in
Leech, G., Wilson, A., Rayson, P. (forthcoming). Word Frequencies
in Spoken and Written English: based on the British National
Corpus. Longman, London.
Due to the size of the lists, we plan to make the longer versions
available on the UCREL website later this year when the book is
published.
http://www.comp.lancs.ac.uk/ucrel/
More information about the Corpora
mailing list