[Corpora-List] Bootcamp: 'Quantitative Corpus Linguistics with R'--re Louw's endorsement

Linas Vepstas linasvepstas at gmail.com
Mon Aug 18 19:51:50 UTC 2008


2008/8/18 Geoffrey Williams <geoffrey.williams at univ-ubs.fr>:
>
> However, I also know that I am a corpus linguist, I do not do Natural
> Language Processing, nor Cognitive Linguistics, not because I am not interested,
> nor because I consider them irrelevant, but because I am primarily interested in
> language in the corpus. To quote John Sinclair, I « Trust the Text ». Trusting
> the Text is what corpus linguistics is all about. It is instructive to go back
> to the writings of Firth who refuted all mentalism.

Hmm.  I'm a mathematician by  vocation, and an utter
novice in linguistics.  For me, everything looks like math,
and of course, I naturally refute everything that isn't math.

A large corpus of text allows me to model that text with
mathematical exprssions. At the most basic level, these
are low, vile statistical measures. At the next level, these
statistical measures allow me to discern patterns and
structures. For me, I perceive these patterns as "lambda
expressions", but the linguistics community calls them
"parsers". Olde-fashioned parsers were hand-built by
means of "mentalism", and judged by means of
mentalistically-annotated reference corpii.

Newer work tries to discover these patterns automatically,
de novo, from text, with minimal a-priori assumptions.
The cognitive folks, such as John Sowa, are trying to find
patterns within patterns -- with the eventual goal of extracting
meaning, in the sense of building a generally intelligent
machine that can listen and talk -- talk properly, or hep-cat
prosody.  Access to a large body of text is essential to this
effort.

> However, saying that Cognitive linguist accepts corpus linguistics does sound
> rather pretentious. I am glad they accept our existence, but saying so sounds a
> bit like the so-called Unification Church that likes to takes bits from a
> variety of religions whilst respecting the basic tenets of none.

It doesn't just "sound like", but rather "it is", and very
intentionally so.  The only basic tenet is "make it work",
in the sense of  "all is fair in love and war". Using a
hodge-podge of techniques, borrowed and bastardized
"crown jewels" from some discipline or another, that's
what its about.  The theft is not only from various branches
of linguistics -- its agnostic to where the ideas come from,
and they come from everywhere.

--linas
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list