Corpora: RE: corpora: evidence and intuition

Patrick Hanks patrick at lingomotors.com
Thu Nov 1 21:27:28 UTC 2001


 
hi Michael -
 
Ah! one of the most core points about Corpus Linguistics ever made!
(or perhaps I should say "corest"?)
 
Of course, you're right. A corpus is only a collection of texts, when
all is 
said and done.  Explanations do not spring, fully formed without human 
intervention, from a corpus -- nor even from a concordance.  Corpus data

needs to be interpreted (in different ways, for different applications).

 
But the question is, whose interpretation?  Whose intuitions?
TEFL Teachers, please tell: do students learn better if presented 
with A, or with B? - 
 
        A) Pre-sorted sets of concordance lines (maybe with carefully 
            crafted explanations already attached), or 
        B) Unorganized concordances, which they have to wrestle through,

            forming their own hypotheses and imposing their own order.
 
Obviously A is quicker  -- but, if time is not a problem, is B more
effective?
 
Patrick
 
 

-----Original Message-----
From: Michael Rundell [mailto:michael.rundell at dial.pipex.com]
Sent: Thursday, November 01, 2001 3:49 PM
To: Patrick Hanks
Cc: corpora at hd.uib.no
Subject: corpora: evidence and intuition



Patrick (and list memmebrs) I wanted to jump into this discussion
earlier so I'm glad you have now joined it. 

Your point about the possible non-salience of copula verbs (sales
totaling $100) struck a chord - I still remember my first "discovery" on
looking at the COBUILD corpus (circa 1982) was that "represent" often
appeared as V+C in expressions like "this represents a major
breakthrough" - yet none of the English pedagogical dictionaries had
spotted it up to that point.

To a degree at least, some of these oddities are explained by corpus
composition - totaling a car might come up in unscripted American speech
(or maybe in a movie like "Clueless") but I wouldn't expect to find it
in your corpus (BNC - purely British - plus Reuters and AP - news text,
I assume); and conversely corpora like AP and WSJ are bound to have an
awful lot of "revenues totaling $50m" etc. That'd also explain some of
the other contributions (eg John Williams on radio station collocating
so often with "seize" and "take over": suspect the source here [Bank of
Enfglish] is a tad overweight in journalistic texts)

But of course there's more to it than this.

The thing I wanted to add, tho, was to slightly re-phrase Sebastian's
original question from 

-what makes you say "Wow, I wouldn't have thought that" to

"Wow, I wouldn't have thought OF that" (if I hadn't looked in the
corpus)

-meaning... : most of the time (not all, of course) the corpus reveals
something we sort of already knew but could not retrieve through the
unreliable process of introspection: i.e., when I saw that use of
"represent" it wasn't that I'd never heard of it before (far from it) -
so often, our response is more like "Of course, why didn't I think of
that?!". 

People doing corpus lexicography do indeed find they are subtly (and
sometimes not so subtly) tweaking the description of English in their
dictionaries, almost daily, to reflect insights that could only have
been gleaned from a good corpus - but on the whole these insights do not
actually "surprise" us (imho). 

Here's an example. It looks like CORE is now becoming an adjective (as
well as a noun&verb). We're all familiar with the noun-modifier use
beloved of management gurus (core business/competences/values etc) but
now we're seeing even more adjective-like signs (e.g. this is absolutely
core; core to this design is a sense of ...). So the evidence suggests
we shd add a new word class. That's great, and I seriously doubt we
could have recognised this without corpus data - but is it really a
"surprise"? 

In fact I'm slightly suspicious of people who claim to be continually
"surprised" by what they find in corpora (of their own native languages
anyway) - it suggests to me their intuitions aren't very good. (At
least, as far as lexical data goes; I'm persuaded by some other
contributions, e.g. John MCKenny's point about "would", that we are
probably not at all that good at predicting the relative frequency of
grammatical systems)

I know intuition is a dirty word in some circles, but I think we need to
*completely* distinguish it from introspection (i.e .where you just try
to retrieve data from your own mental lexicon - this of course IS
demonstrably unreliable). Could we say in this context intuition is the
faculty by which humans interact with and interpret corpus data? All I
know is, you don't get far without it in lexicography. Having worked
with/hired/trained/been trained by maybe 150-200 lexicographers over the
years, I would bet my last shirt that someone with lousy intuition,
given the best lingusitic resources and software in the universe, would
produce a much worse dictionary than someone with great intuitions and
just a modest corpus with basic software - would you agree Patrick (and
others)?

Michael rundell

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20011101/ea261e43/attachment.htm>


More information about the Corpora mailing list