[Corpora-List] QM analogy and grammatical incompleteness

Rob Freeman lists at chaoticlanguage.com
Tue Dec 20 03:31:16 UTC 2005


On Tuesday 20 December 2005 15:13, Dominic Widdows wrote:
> > Let us have a clear statement of their limitations, if limited they
> > are.
> >
> > In short, do you believe there is a limitation on knowledge
> > analogous to the Uncertainty Principle of QM, which applies
> > to the simultaneous characterization of text in terms of
> > grammatical qualities (defined distributionally)?
>
> Can you find two "observables" in grammar that can't in principle be
> measured together? Two observables that are measured in such a way that
> the measuring of one interferes directly with the measuring of the
> other? I think that is the question you should ask if you want to find
> a really convincing analogy, or alternatively discover that the model
> isn't really appropriate.

Almost any clustering of word associations to abstract grammatical categories 
breaks the same usage down in different ways.

The point is, in general, some data is used in both analyses, but can't be 
part of both at once. So the analyses conflict. This will always occur in any 
distributional analysis (except that which is fundamentally parameterized by 
rules.)

An example from my own experience is:

He-PRON came-V only-DET yesterday-N
(lined up with "He came just yesterday"?)

He-PRON came-V only-ADV yesterday-N
(lined up with "He came early yesterday"?)

This is an example of an actual tagging dispute between myself and a colleague 
some years ago. Others on the list may have intuitions one way or another. 
That does not matter. The point is we have conflicting intuitions.

In truth I think "only" in this sentence contributes both to our intuitions of 
what it is which makes an adverb, and what it is which makes a determiner 
(and other analyses besides.) The two analyses represent two alternate ways 
of ordering word combinations in the language. Neither order is more true, 
but you can't have both orders at once.

> The Uncertainty Principle goes way further than just stating that you
> can't know everything at once - it makes very precise statements about
> what you can't know, based on what you've already measured. It thus
> appears to be a very special kind of "knowledge incompleteness"
> argument, and I don't know if it has linguistic counterparts.

I think almost any distributional analysis will have this property, unless the 
underlying distribution is produced by rules. If language were produced by 
rules I accept there would not be any overlap, and thus conflict, between the 
categories. But we do see conflict.

The failure of linguistics over the last 50 years to find objective rules 
describing language is something we could take as evidence of this. People 
have even measured it. As I said, Ken Church claimed as many as 3% of tagging 
decisions were disputed, even after negotiation between the taggers, in a 
study he published in 1992.

Finally I think we have even more venerable evidence. You may have seen the 
thread a couple of months back where I raised Chomsky's observations from 
back in the '50s that grammar could not be learned distributionally. 
Chomsky's observations were of just the kind we seek. Bizarrely the 
conclusions he drew from them were that _distributional methods_ were wrong, 
rather than the premise that the distributions were parameterized by rules 
(even more bizarrely linguistics went for it, and threw out distributional 
methods, and corpora, for 20 years.) But the experimental results 
(indeterminacy in distributionally learned grammar) can be interpreted both 
ways.

-Rob



More information about the Corpora mailing list