[Corpora-List] Quantitive Corpus Linguistics

Fri Aug 22 18:10:18 UTC 2008

This is getting very close to my favourite topic: Edmund Husserl and his theory of meaning.  Unlike Leibniz and Kant mentioned by Dom, Husserl does occupy the middle ground between empiricism and mentalism.  On the one hand his slogan "To the things themselves" called for a quite empirical research programme in 'cognitive science' of the beginning of the 20th century.  On the other hand, by 1920 it turned into a programme of intersubjective mentalism: meanings exist as long as there are people ready to acquire and reproduce them.  According to Husserl, meanings are neither objective, as there is no Platonic realm of 'naked ideas', nor subjective, as you can't keep your own private version of the Pythagoras' theorem or even of the word 'table' (another 'private language' argument invented before Wittgenstein).  

Husserl's ideas relevant to semantics concern meaning constitution: a sign is endowed with a meaning in correlation with the current state of the life-world of the person interpreting it.  His second interesting idea, again relevant in our context, is that of time-consciousness: anticipation, constitution and sedimentation of mental images in the stream of interpretation.

Another philosopher to be mentined in this context is Peirce, but, obviously, John Sowa can say more about the relevance of his contribution.

The usual problem I found with putting philosophical ideas into our current research concerns their abstract nature.  Some sort of dynamic interpretation of source texts is quite common in computational linguistics, see Yorick's work on preference semantics.  Some sort of time-consciousness is relevant to linear parsing and memoization techniques.  The crucial question is whether we can get from philoshophical ideas anything useful apart from metaphors.  For instance, it follows from Husserl that we acquire our life-world via interaction with the real world and society.  For linguistics, this means that a corpus is the ultimate source for acquiring ontologies and lexicons. So what?  We didn't know this?  And the mechanisms we use to get a semantic lexicon from this ultimate source (e.g., distributional semantics) are  very different from the ideas put forward by Husserl.

My puzzled 2p about the power of philoshophy,
Serge

-----Original Message-----
From: corpora-bounces at uib.no on behalf of Dom Widdows
Sent: Fri 22/08/2008 05:03
To: Mike Maxwell
Cc: corpora at uib.no
Subject: Re: [Corpora-List] Quantitive Corpus Linguistics

Dear All,

I certainly agree that studying the relevant philosophy has been an
important part of many (if not most) successful scientific endeavours,
though it can also mislead if applied in the wrong contexts (the same
can be said of mathematics.

Peter Helias is not someone I'd come across before, and he's not the
easiest to find out about online - I have started a stub Wikipedia
article (at http://en.wikipedia.org/wiki/Peter_Helias), but his
contribution to the theory of substance and accidence is still unclear
to me. Christian scholars often trace this through Aquinas (important
in the theory of transubstantiation - body and blood of christ are
substance, bread and wine are accidence), and perhaps through
Augustine to Aristotle.
(I know most of this through dinner conversations with my father, so
don't really know the references well). A more pluralistic story might
be to trace the influence of Aristotle through Averroes and al-Farabi,
who certainly wrote some fascinating things on the way words would
become reused, formally or informally, to refer to many different but
related concepts - perhaps anticipating generative lexicon theory.

I'm surprised to hear the notion that "collocation is everything"
coming through a voice in this tradition, I haven't yet found such
arch-empiricist quotes from Helias himself was (but need to find more
corpus data here!). I think of this "data is everything, there is no
need for a mind" attitude  associated with David Hume and the Scottish
enlightenment, sometimes described as a kind of reaction to Descartes'
"reason is everything" (or at least "I am a thing that thinks", as
contrasted with a thing that experiences and learns). Leibniz and Kant
are both supposed to have tried to find different middle-grounds
between these extremes. (Here I could find probably find good quotes,
but it's getting late ... write to me if you want me to try and back
this up with sources.)

There are a couple of themes behind this ramble, honest ...

The first is that every branch and period of science struggles over
this learning vs. reasoning territory, and we are very much in the
midst of this struggle in computational linguistics. If we can learn
anything from the story of other sciences (even mechanics), corralling
one side or the other into putting their tools away never leads to the
full story.

Secondly, there is an Aristotelean theme throughout - Aristotle's
influence isn't opposed to Plato's, it emphasizes a framework for
learning in a world that still has a lot of underlying form to it.

On 8/21/08, Mike Maxwell <maxwell at umiacs.umd.edu> wrote:
> J Washtell wrote:
>  > I find it a bit optimistic (given my own intuitions of course. But I
>  > should say that I do not find it beyond the realms of possibility)
>  > that the evidence necessary to solve all of our linguistic and
>  > (unavoidably?!) cognitive-linguistic ponderings is to be found in the
>  > text (not in the brain, say, or in the extra-corporal context).

Hence I agree with this reservation - trying to find everything in the
text alone would be like Hume trying to find everything in the data
alone without any contribution from reasoning. (Please come out an
correct me, Hume scholars, if I'm out of line here.)

> Not to mention that if you limit yourself to studying things that
>  require large corpora, you rule out studying perhaps 99% of the
>  languages in the world.

This I'd disagree with - you can learn things about the structure of
language in general by considering available large corpora, and use
this knowledge to try and enhance what you can do with small datasets.
Linear B was a comparatively small corpus, but using knowledge of
classical Greek, it could be decifered. Perhaps this is a canned
example since the languages are in a sense "the same" - but even for
completely unrelated languages, a good linguist uses information
learned about familiar languages to build expertise on language in
general, and can then apply this expertise and technique to fresh
languages with small amounts of data. It's only if corpus linguistics
explicitly rules out generalization that  a strictly empiricist
approach leads to no cross-lingual extrapolation.

Best wishes,
Dominic

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora