[Corpora-List] Corpus vs Intuition

Gill Philip g.philip.polidoro at gmail.com
Thu Sep 18 07:50:30 UTC 2008


Dear all,

unwittingly, it seems, we're again falling into the trap of not
distinguishing between introspection and intuition. All linguists can, do
(and must) use their intuition when analysing language; you can take or
leave introspection.

Corpora provide a source of data which originates in the intuitive use of
language by 'ordinary' speakers. This use is not introspective: even if we
consider that most corpus data in current use is written rather
than transcribed, and so will have undergone a degree of revising and
refining by the original authors, the kind of thinking about language that
is necessary for perfecting one's writing style is not the same as an
armchair linguist's delving for data in his/her own head, in which the
introspective part filters out a lot of what is intuitive. Of course, when I
say armchair linguist, I'm not excluding non-professionals. I really mean
anyone who makes claims about what they personally do, or what an ideal
speaker does, or what the speaker community does.

Working with figurative language and phraseology, I find that a lot of what
can be discovered by corpus analysis is confirmed and complemented by
psycholinguistic research; sometimes the concordances explain the whys and
wherefores, and other times it is the experimental data that explains
oddities in the concordances. I also find that many corpus and
psycholinguists are united in their dissatisfaction with the claims made by
their theoretical linguist, language philosopher, and/or cognitive linguist
colleagues, who prefer to introspect for their data. Both corpus and
psycholinguistic scholars highlight the fact that introspectively-derived
data inevitably leads to oversimplification, and oversimplification amounts
to getting the facts wrong an awful lot of the time. Of course some of the
facts will be accurate - and this is why people still introspect, because if
it never worked, they'd have given up on it ages ago - but the finer details
tend to escape notice.

That said, convincing an introspector to use a corpus instead is probably as
difficult as convincing somebody who truly believes the world is flat that
is is a sphere... or like convincing a corpus user to introspect his/her
concordances. Maybe in 300 years time the paradigm shift will have taken
place, but I for one do not plan to hold my breath. Live and let live: but
we have much more interesting and exciting data ;-)

Gill



On 17/09/2008, Martin Wynne <martin.wynne at oucs.ox.ac.uk> wrote:
>
> The paper which you cite may well focus explicitly on this area in a
> useful way, but of course anyone who uses corpus data always uses  both
> intuition and corpus data. Or has anyone worked out how to turn their
> intuition off? (I guess you could try to do corpus linguistics on a
> language that you don't know, but I don't imagine you'd get very far.)
> The point is surely that a corpus linguist doesn't rely on intuition
> only, but checks intuitions against corpus data.
>
> And thus, (to be a bit more provocative) by constantly testing, refining
> and sometimes refuting one's intuitions, the corpus linguist can become
> better at knowing when and when not to trust introspection, and can
> become a much better armchair linguist than the armchair-bound linguist.
>
> Martin Wynne
>
> nina at scils.rutgers.edu wrote:
> > James Pustejovsky and Anna Rumshisky have recently published an article
> > advocating an approach that uses both intuition *and* corpus data to
> > develop and test linguistic theory.
> >
> > Pustejovsky, J., & Rumshisky, A. (2008). Between Chaos and Structure:
> > Interpreting Lexical Data Through a Theoretical Lens. Int J
> > Lexicography, 21(3), 337-355. doi: 10.1093/ijl/ecn023.
> > Abstract: In this paper, we explore the inherent tension between corpus
> > data and linguistic theory that aims to model it, with particular
> > reference to the dynamic and variable nature of the lexicon. We explore
> > the process through which modeling of the data is accomplished,
> > presenting itself as a sequence of conflicting stages of discovery.
> > First-stage data analysis informs the model, whereas the seeming chaos
> > of organic data inevitably violates our theoretical assumptions. But in
> > the end, it is restrictions apparent in the data that call for
> > postulating structure within a revised theoretical model. We show the
> > complete cycle using two case studies and discuss the implications.
> >
> >
> >
> > Mai Zaki wrote:
> >
> >> Dear colleagues,
> >>
> >> My question may be a bit basic but I would appreciate your feedback.
> >> My own research is corpus-based and I am working in the field of lexical
> semantics/pragmatics where the majority of the literature is based on
> made-up examples and testing of native speakers' intuitions. So, I still get
> stuck in my discussions with others trying to convince them that corpus work
> and real life examples add a different angle to any research. The usual
> objections are that it's not about numbers and percentages, and that
> patterns of use are questionable because of the issue of how representative
> they are.
> >> I want to include in my own work a part about the advantages of
> corpus-based work as opposed to arm-chair linguistics, and I would
> appreciate if you could guide me to any references on this topic as well as
> your own ideas.
> >>
> >> Thank you.
> >>
> >> Mai Zaki
> >> Middlesex University
> >>
> >> _______________________________________________
> >> Corpora mailing list
> >> Corpora at uib.no
> >> http://mailman.uib.no/listinfo/corpora
> >>
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
*********************************
Dr. Gill Philip
CILTA
Università degli Studi di Bologna
Piazza San Giovanni in Monte, 4
40124 Bologna
Italy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080918/069db392/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list