[Corpora-List] Bootcamp: 'Quantitative Corpus Linguistics with R'-- re Louw's endorsement

Thu Aug 14 11:49:21 UTC 2008

Dear All and Wolfgang
Wolfgang, as always, has brought out the big guns of philosophy and
language to discuss state of linguistics research.

Wolfgang's 'attack' on the utilitarians who are very happy to follow
DARPA-initiated dictum of 80:20 (if it works 80% of the times, then the
algorithm is good enough for language processing) is rather
uncharacteristic and a little unkind.  One can see that the utilitarian
use of ethics in medicine and animal welfare, has rekindled interest in
the hitherto ivory towerish notions of rights and duties.  People now
write more thanks to word processing and get their spellings checked using
much of the 80:20 software.

His note on isloationist tendencies amongst linguists is worth noting as
well: The isolation and containment of theoretical stances that are
disliked by one group of academics has a long history. However, this
apparently emotive attitude to 'my theory' (and 'all theories but yours')
reassures me that almost all researchers are human after all and have
emotions - something automatons cannot have.

It is interesting to note the rise of punitive terminology and the
marginal militarization of language: bootcamps? Google bombs? what next? 
However, there is some hope as I noted in an announcement I just noted
looking for 'bootcamp' on Google:

The iQ Boot Camp is Europe's leading boutique web conference, designed for
organisations (http://www.iqcontent.com/events/bootcamp/about-bootcamp).

Boutiques and bootcamp don't quite collocate.  Perhaps, we can all
discover our softer side when naming our conferences/workshops.

Thank you Wolfgang and good luck Prof Gries.

>
>
> Dear All,
>
> I find the interaction between Bill Louw and Stefan Gries on this list so
> exciting that I cannot resist the temptation to contribute to it. Of
> course Bill Louw gets it wrong by expecting a bootcamp to be anything like
> a conference. The corpus tells us that a boot camp is
>
> another word for a military training camp which was used during World War
> II and many other wars
>
> a very strict, highly structured facility with staff that act as drill
> instructors
>
> more like a review that prepares you for an exam.
>
> Dagmar S. Divjak's and Stefan Gries' boot camp is, as I see it, not about
> discussing corpus linguistics, but rather tells participants
>
> how to generate frequency lists;
>
> how to search for words and patterns;
>
> how to handle corpora and perform corpus-linguistic searches that typical
> corpus software does not support;
>
> how to carry out basic statistical evaluations of corpus data
> (significance tests and statistical graphs).
>
> Gries claims that statistics clearly plays a subordinate role in this
> syllabus, but also that R-based software tools will be made available that
> allow to easily perform many of the above operations. The title of the
> event is: "Quantitative Corpus Linguistics with R." The provider of this
> software tells us: "R is a free software environment for statistical
> computing and graphics." (http://www.r-project.org/)
>
> For R-software, it does no matter what kind of strings of information bit
> are processed. It could be language, but it could also be DNA sequences or
> the ciphers behind the "3." in the number pi. To me it seems that much of
> what will be presented at the camp is relatively application-free.
> Language is just one of many possible applications. What is not discussed
> is what a morpheme is, what makes a sentence a sentence, or how we can
> measure language acquisition. What is not mentioned is meaning.
>
> But then we have to remember that Stefan Gries wears at least two hats.
> The journal he co-edits bears the name Corpus Linguistics and Linguistic
> Theory. The only language theory that Gries accepts is cognitive
> linguistics. His homepage leaves us in no doubt. Meaning, for Gries, is a
> theoretical and therefore a cognitive concept. It plays no role in his
> version of corpus linguistics.
>
> Old-fashioned corpus linguists like myself have to accept that the label
> corpus linguistics has, over the last decade, been hijacked by theoretical
> linguists of all feathers. What used to be and still is for some of us a
> radically different, a new way to look at language, has been foreshortened
> to a bunch of methods, a toolbox to "search for words and patterns." Its
> role is to provide empirical data that will then be interpreted from the
> theoretical platform of cognitive linguistics. Corpus linguists are not
> innocent of this trend. At home in applied linguistics, they have often
> shied away from formulating the fundamental difference between the two
> approaches: For cognitive linguists, meaning is in the individual, monadic
> minds of speakers and hearers; for corpus linguists, meaning is in the
> discourse (or the corpus, as a sample thereof).
>
> For Bill Louw, the inspirational  theoretician of my version of corpus
> linguistics, collocation, and certainly  not statistics, is at the very
> heart of meaning. It is how meaning configures itself within a text and
> within the discourse. It relates a phrase we find in a text to the
> discourse at large. It allows us to investigate meaning through
> intertextual links and through paraphrase. It does not supply us with a
> hypothetical model of the meaning of a phrase, as cognitive linguistics
> does. Rather it presents the evidence of the meaning itself. It is then up
> to the interpretive community to make sense of it. Language is symbolic.
> Meaning has to be negotiated. It is irreducible to neurons firing in our
> brains.
>
> Cognitive linguistics tells Stefan Gries what a morpheme, a word, a phrase
> or a pattern is. This, then, is his input into the toolbox that he and
> many others now call corpus linguistics. Corpus linguists still don't know
> what a morpheme, a word, a phrase or a pattern is. That is why they always
> insist on discussing collocation. But they know that words change their
> meaning. There would be no innovation without the re-interpretation of
> what is there. Stefan Gries' brand of corpus linguistics may well be our
> brave new world. It is, however, not John Sinclair's corpus linguistics.
>
>
>
> Wolfgang Teubert
>
>
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

Khurshid Ahmad

Professor of Computer Science
Department of Computer Science
Trinity College,
DUBLIN-2
IRELAND
Phone 00 353 1 896 8429

Web Page: http://people.tcd.ie/kahmad

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora