[Corpora-List] Bootcamp: 'Quantitative Corpus Linguistics withR'--re Louw's endorsement

Linas Vepstas linasvepstas at gmail.com
Thu Aug 28 16:18:08 UTC 2008


Hi,

2008/8/28 Wolfgang Teubert <w.teubert at bham.ac.uk>:
>
> You mention qualia,

I had read something which said (to paraphrase) "a concept or lexis
does not exist except as a negotiated meaning within a corpus",
which can be dangerously misunderstood -- let me rephrase this
as a question "Can meaning exist outside of the corpus of negotiated
conversation?"   This has two obvious answers: "no", and "yes",
both of which are correct, depending on what the meaning of "meaning"
is. In the strict sense of corpus linguistics, the answer is "no": there
can be no meaning except that which is found in language.  And with
this, I agree.

Yet, it seems, we humans can talk about many things of non-linguistic
origin, of which "qualia" was meant to be a primordial example. We
map the  linguistic concept of "pain" onto that qualia we call "pain".
The feeling itself exists outside of the discourse, its an object of
discourse.  Thus, I conclude that meaning does not exist in a vacuum,
but exists in reference to the thing being talked about.  What's more,
that reference exists only inside of us, and not in the corpus itself.

So, while I can run corpus tools on a corpus, and find many interesting
statistical correlations for the word "pain", is it correct to call the
sum-total of these statistical correlations "the meaning of pain"?
Because, when you say:

> Meaning is only in the discourse.

that is what you seem to imply: that meaning is nothing more than
the statistical vagaries of a text.  Given that there is an immense,
a huge amount of structure in text, then, indeed, maybe indeed we
can beleive that meaning is nothing more (and nothing less!!) than
the strucutre of text.  But then, in your next sentence:

> It is what is exchanged between and shared by people.

Ahh! But how do I exchange and share my thoughts about "pain"?
I personally draw upon a font of qualia, and this font shapes the
words that I choose to use when talking about "pain".  When I learn
a new language, I learn the "meaning" of its words; but having
learned these, I bring my own experience to play when I use
these words.

It is along similar lines of reasoning that some AI folks now insist
that intelligence can't be disembodied: No amount of statistical
correlation taken from text will truly capture "meaning": one must
attach the machine to sensors: sight, touch, movement; so that,
when one says to it, "this is a table", it can see and touch it -- it too
can attach "the meaning of table" that was "data-mined" from out
of a large corpus, it can attach that meaning to that thing that is
sensed.

When I say "AI" I don't mean "human level intelligence". If you've
followed the 2005 DARPA Grand Challenge, you know that we
now have automobiles that can drive themselves, and its only
cost and lawyers preventing them from going mass-market. If
you follow the news from Iraq, you might know that some soldiers
are now running around with verbal (voice-reco + speech-generation)
hand-held machine-translation devices.  You don't need a lot of
imagination to realize that you can hook up one of these
self-driving cars to the voice unit.  We really aren't very many years
away from being able to talk to our cars: "watch out for that table in
the middle of the road!", followed by some confusion as to whether
the mass detected by radar is a table or not.  Such a car would
have an intelligence of (much) less than a human child, yet
none-the-less, it would be a talking car.

Should we collect a corpus of automotive utterances: things my
machine said to me?  Should we run statistical tools on this corpus?
Should we argue about what the machine meant, when it said "watch
out for that table!"?  Should we argue about the "meaning of meaning"
when we talk about the corpus of emulated English spoken by
machines?

Enough. Let me quibble a bit:

> no machine that has intentionality.

Well, the self-driving vehicle has the 'intent' of not hitting anything
as it intentionally moves from point A to point B.

> The result of summarisation is predictable;

But may still be surprising. If I ask my talking, self-driving machine
to summarize the trip, it may well utter some unexpected remarks.

In physics, there are dynamical systems exhibiting "chaotic
behaviour": systems that are essentially unpredictable. These
systems can be very simple: a few weights, a spring, a magnet.
While being deterministic, (described by fixed mathematical
equations of motion), they remain unpredictable.

Now, computers are deterministic, doing whatever the
software tells them to do, but, as dynamical systems, are
far far more complex than a few springs and a magnet.
While in principle a computer is "predictable", in practice
they are not: they only seem predictable because the
programmers have taken great pains to make sure that
their software does not surprise the user.

> All mechanical, rule-based ways for describing the meaning of 'table' (including the statistical devices developed by corpus linguists) cannot replace our collaborative interpretation of the word as it crops up in the discourse.

Why not?  All due respect to humans, but, come the day we
have self-driving cars (having an IQ of 2) that can spot tables
in the middle of the road, and can talk about them, we seem to
reach a crisis.

--linas

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list