<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META content="MSHTML 5.00.2919.6307" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV> </DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN class=903445320-01112001>hi
Michael -</SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001></SPAN></FONT> </DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001>Ah! one of the most core points about
Corpus</SPAN></FONT><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001> Linguistics ever made!</SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN class=903445320-01112001>(or
perhaps I should say "corest"?)</SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001></SPAN></FONT> </DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN class=903445320-01112001>Of
course, you're right. </SPAN></FONT><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001>A corpus is only a collection of texts, when all is
</SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN class=903445320-01112001>said
and done. </SPAN></FONT><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001> Explanations </SPAN></FONT><FONT color=#0000ff
face=Arial size=2><SPAN class=903445320-01112001>do not spring, fully formed
without human </SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001>intervention, from a corpus -- nor
</SPAN></FONT><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001>even from a concordance.
Corpus</SPAN></FONT><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001> data </SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN class=903445320-01112001>needs
to be interpreted (in different </SPAN></FONT><FONT color=#0000ff face=Arial
size=2><SPAN class=903445320-01112001>ways, for different applications).
</SPAN></FONT></DIV>
<DIV> </DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN class=903445320-01112001>But
the question is, whose interpretation? Whose
intuitions?</SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN class=903445320-01112001>TEFL
Teachers, please tell</SPAN></FONT><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001>: do students learn better if presented
</SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN class=903445320-01112001>with
A, or with B? - </SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001></SPAN></FONT> </DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001> A)
Pre-sorted sets of concordance</SPAN></FONT><FONT color=#0000ff face=Arial
size=2><SPAN class=903445320-01112001> lines (maybe with carefully
</SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001>
crafted explanations already attached), or </SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001>
B) U</SPAN></FONT><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001>norganized concordances, which they have to wrestle
through, </SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001>
forming their own hypothe</SPAN></FONT><FONT color=#0000ff face=Arial
size=2><SPAN class=903445320-01112001>ses</SPAN></FONT><FONT color=#0000ff
face=Arial size=2><SPAN class=903445320-01112001> and imposing
their own order.</SPAN></FONT></DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001></SPAN></FONT> </DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001>Obviously A is quicker -- but, if time is not a
problem, is B more effective?</SPAN></FONT></DIV>
<DIV> </DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001>Patrick</SPAN></FONT></DIV>
<DIV> </DIV>
<DIV><FONT color=#0000ff face=Arial size=2><SPAN
class=903445320-01112001></SPAN></FONT><FONT color=#0000ff face=Arial
size=2><SPAN class=903445320-01112001></SPAN></FONT> </DIV>
<BLOCKQUOTE style="MARGIN-RIGHT: 0px">
<DIV align=left class=OutlookMessageHeader dir=ltr><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Michael Rundell
[mailto:michael.rundell@dial.pipex.com]<BR><B>Sent:</B> Thursday, November 01,
2001 3:49 PM<BR><B>To:</B> Patrick Hanks<BR><B>Cc:</B>
corpora@hd.uib.no<BR><B>Subject:</B> corpora: evidence and
intuition<BR><BR></DIV></FONT>
<DIV>
<P>Patrick (and list memmebrs) I wanted to jump into this discussion earlier
so I'm glad you have now joined it. </P>
<P>Your point about the possible non-salience of copula verbs (sales totaling
$100) struck a chord - I still remember my first "discovery" on looking at the
COBUILD corpus (circa 1982) was that "represent" often appeared as V+C in
expressions like "this represents a major breakthrough" - yet none of the
English pedagogical dictionaries had spotted it up to that point.</P>
<P>To a degree at least, some of these oddities are explained by corpus
composition - totaling a car might come up in unscripted American speech (or
maybe in a movie like "Clueless") but I wouldn't expect to find it in your
corpus (BNC - purely British - plus Reuters and AP - news text, I assume); and
conversely corpora like AP and WSJ are bound to have an awful lot of "revenues
totaling $50m" etc. That'd also explain some of the other contributions (eg
John Williams on radio station collocating so often with "seize" and "take
over": suspect the source here [Bank of Enfglish] is a tad overweight in
journalistic texts)</P>
<P>But of course there's more to it than this.</P>
<P>The thing I wanted to add, tho, was to slightly re-phrase Sebastian's
original question from </P>
<P>-what makes you say "Wow, I wouldn't have thought that" to</P>
<P>"Wow, I wouldn't have thought OF that" (if I hadn't looked in the
corpus)</P>
<P>-meaning… : <B>most</B> of the time (not all, of course) the corpus reveals
something we sort of already knew but could not retrieve through the
unreliable process of introspection: i.e., when I saw that use of "represent"
it wasn't that I'd never heard of it before (far from it) - so often, our
response is more like "Of course, why didn't I think of that?!". </P>
<P>People doing corpus lexicography do indeed find they are subtly (and
sometimes not so subtly) tweaking the description of English in their
dictionaries, almost daily, to reflect insights that could <B>only</B> have
been gleaned from a good corpus - but on the whole these insights do not
actually "surprise" us (imho). </P>
<P>Here's an example. It looks like CORE is now becoming an adjective (as well
as a noun&verb). We're all familiar with the noun-modifier use beloved of
management gurus (core business/competences/values etc) but now we're seeing
even more adjective-like signs (e.g. this is absolutely core; core to this
design is a sense of …). So the evidence suggests we shd add a new word class.
That's great, and I seriously doubt we could have recognised this without
corpus data - but is it really a "surprise"? </P>
<P>In fact I'm slightly suspicious of people who claim to be continually
"surprised" by what they find in corpora (of their own native languages
anyway) - it suggests to me their intuitions aren't very good. (At least, as
far as <B>lexical</B> data goes; I'm persuaded by some other contributions,
e.g. John MCKenny's point about "would", that we are probably not at all that
good at predicting the relative frequency of grammatical systems)</P>
<P>I know intuition is a dirty word in some circles, but I think we need to
*completely* distinguish it from introspection (i.e .where you just try to
retrieve data from your own mental lexicon - this of course IS demonstrably
unreliable). Could we say in this context intuition is the faculty by which
humans interact with and interpret corpus data? All I know is, you don't get
far without it in lexicography. Having worked with/hired/trained/been trained
by maybe 150-200 lexicographers over the years, I would bet my last shirt that
someone with lousy intuition, given the best lingusitic resources and software
in the universe, would produce a much worse dictionary than someone with great
intuitions and just a modest corpus with basic software - would you agree
Patrick (and others)?</P>
<P>Michael rundell</P></DIV></BLOCKQUOTE></BODY></HTML>