Corpora: Readings in Corpus Linguistics

Mon Sep 3 08:49:11 UTC 2001

Dear Corpora Listers,

As mentioned earlier this summer, Anne Wichmann and I are now definitely
planning to edit a book of Readings in Corpus Linguistics, and we are
beginning to negotiate with publishers.  This message is an invitation to
the corpus community to help us make it a useful publication.

We certainly haven't found it difficult to assemble lists of items worthy
of inclusion, in fact we already have a set of possibles which is quite
a bit too long and will have to be cut down.  But, even so, we could very
easily have overlooked crucial papers which have done far more to define
or advance the field than some of those we are thinking of including.
Many of you will have personal favourite items in the literature, perhaps
gems that you know about but which appeared in hard-to-get-hold-of places,
or which are accessible but don't seem to be appreciated as widely as they
deserve.  Anne and I would really like to hear your nominations --
obviously we won't necessarily include them, but we certainly ought to
consider them.  For that matter, we'd like to hear your ideas about papers
that everyone knows and recognizes as fundamental:  you may think they are
too obvious to mention, but we could well have forgotten about them.

Our interpretation of "Corpus Linguistics" is a broad one.  We want the
book to cover both technical NLP and humanities approaches.  During
our careers, the subject has evolved from a "minority of a minority"
speciality into a major concern of linguistics and computing departments,
and it has been changing and developing almost explosively over the last
decade.  As a result, we have the impression that there are many people
who have got drawn in recently who are in the situation of the blind men
with the elephant -- they have learned about the bit of the subject they
deal with directly, but they feel at sea about the overall purview of
the discipline, and find it hard to read the literature because they don't
yet have much perspective on where the subject has come from or how the
various aspects relate to one another.  We want to produce a book that
gives that perspective.  Thus we are specially keen to include
papers that bridge the divide between technical matters like XML, stats,
or automatic parsing, and humanistic considerations such as literary
language, language teaching, sociolinguistics, or historical linguistics.

We also want so far as possible to find short pieces, so that we can
introduce many different topics without the book becoming too big.  And
we aim to include a leavening of papers that are entertaining as well
as informative; we hope the book may help to attract new recruits to the
discipline by showing them that corpus linguistics is fun.

Looking through our own tentative list of "possibles", topics which seem
thinly covered so far include semantics, and "World Englishes and
nonstandard dialects" -- we have one or two possibles for each, but we
are still looking for "killer papers".  And of course our list of twenty
or so corpus linguistics areas may itself have overlooked important
topics.

Please share your opinions with us!

Geoffrey Sampson

G.R. Sampson, Professor of Natural Language Computing

School of Cognitive & Computing Sciences
University of Sussex
Falmer, Brighton BN1 9QH, GB

e-mail geoffs at cogs.susx.ac.uk
tel. +44 1273 678525
fax  +44 1273 671320
web http://www.grsampson.net