Corpora: "must have" lists

Geoffrey Sampson geoffs at cogs.susx.ac.uk
Tue Jun 19 16:10:19 UTC 2001


Lou suggests organizing a "reader" of classic articles in corpus linguistics
as an electronic corpus.  Hmm ...

I feel a bit sceptical about that for two reasons.  One is that I doubt that
the primary purpose (of helping newcomers "read themselves in" to the field)
would be achieved well by an electronic corpus of articles -- people like
reading off paper bound into journals and books, not off the screen -- and
publishers would be less enthusiastic about publishing a collection
(and copyright permissions might be harder to get) if the material were also
being made available electronically.  (I know there are exceptions, such
as the _State of the Art in Human Lg Technologies_ book available both on
the Web and from CUP -- but I think they will always be exceptions rather
than the norm.)

Also, I am not sure that Lou's second purpose, of providing a source from
which one could monitor the development of terms of art in the field, would
really be achieved all that well by a collection of the N classic readings
in the history of corpus ling.  Lou knows a lot more about lexicography
than I do, but it seems to me that the limited number of items one would
most want to encourage newcomers to read would not necessarily coincide with
the texts that best exemplified the development of terminology -- for that,
would a larger bulk of less-exciting items not be more informative?

But I certainly agree with Lou that it would be interesting to see how far
people's "must have" lists coincided.  Since the note of mine to which
Lou is responding, Anne Wichmann and I have discussed whether we might
actually propose a collection like this to a publisher -- I'm not sure
whether either of us is yet clear that we want to commit ourselves to the
effort, but we are clear that one desirable thing would be to use the
Corpora List to get people to propose their personal Top N lists.  I had
thought we would probably wait till we actually got to the stage of
putting a synopsis in front of a publisher, if we ever do -- but I suppose
since Lou has raised the idea, people might want to have fun over the
summer putting together such lists!  Mine would include
the article from _ICAME News_ by ??Stig Johansson and Geoff Leech?? about
significant vocabulary differences between British and American English,
and the one from a book edited by Nelleke Oostdijk, by ?Ken
Church and Bill Gale?, "What is wrong with adding one?" -- but I haven't
started seriously working out a proper list.

Geoff


G.R. Sampson, Professor of Natural Language Computing

School of Cognitive & Computing Sciences
University of Sussex
Falmer, Brighton BN1 9QH, GB

e-mail geoffs at cogs.susx.ac.uk
tel. +44 1273 678525
fax  +44 1273 671320
web http://www.grsampson.net



More information about the Corpora mailing list