FW: Using the BNC

Frank Abate abatefr at EARTHLINK.NET
Wed Dec 18 10:24:51 UTC 2002

What Michael Q observed below, in response to Jonathon G's point, just about
nails it, in brief, re the value of a corpus to lexos (lexicographers).  One
could go on, but MQ has captured the essence of the value of a corpus to
lexicography -- a corpus allows research into collocational patterns and
other such phenomena that are at the heart of how the language works.  This
is impossible to research otherwise, at least not in any sort of depth (with
hats off to the BBI Combinatory Dict, etc.).

Other lexos, such as Sue Atkins and Patrick Hanks, are FAR more conversant
on the value of corpora to lexicography, so please seek them out -- or their
published works -- for more details and effusion.  Also, you could look at
the Intro to either the New Oxford Dictionary of English (UK), or its
transatlantic cousin, the New Oxford American Dictionary.  Both of these are
corpus-based, and, I think, the first general dicts to be corpus-based (the
Collins Cobuild is not a general dict, strictly speaking).

If I were to embark on a new general (not historical) dictionary project,
working from a "blank piece of paper", and were given a choice of having
either a good general corpus or a citation file, I would absolutely choose
the corpus -- in a heartbeat.  One can get citational evidence from OED,
MW3, etc., and can also do Googling and the like for specific word/sense
research.  But if you want to look at the core of the language down into the
nitty-gritty details, you can't beat a good, solid, contemporary corpus, as
long as it is extensive enough -- say, 100 million words AT LEAST.  The
bigger, the better.

Frank Abate

Jonathon Green wrote:

> what it does not do, however, is give a page number for the
> material cited. It gives a page range (presumably those read for
> the Corpus), e.g. 'pp. 62-165' and the number of  's-units' and the
> total word count, but, as I say, no page number as such. This, for
> my purposes, and I would imagine those of other lexicographers,
> renders it more interesting than practically helpful.

I don't use it (I don't have access to it), but I do use the cut-down
CD-ROM version that was produced some years ago. My understanding is
that its great value, as with other corpora, lies in the opportunity
it gives to identify and rank collocations and to assess the relative
importance and frequency of various forms in a balanced image of one
regional type of English.

As others have mentioned, this is something that a search of Google
cannot so easily do, since there are all sorts of systemic biases in
the material that it indexes. Where Google scores over corpora,
however, is that it is a different kind of snapshot, one of current
English that is to a significant degree free from the strictures of
good taste and editing. I've found it immensely useful, for example,
when trying to judge whether a form has gained wide currency as a
folk etymology (chaise lounge, bare with me, without further adieux,

Michael Quinion
Editor, World Wide Words
E-mail: <TheEditor at worldwidewords.org>
Web: <http://www.worldwidewords.org/>

More information about the Ads-l mailing list