[Corpora-List] Chomsky and computationnel linguistics

Dominic Widdows widdows at maya.com
Wed Jul 11 18:13:12 UTC 2007


> Mike Maxwell wrote:
>>> And crucial constructions may simply be so rare that you just can't
>>> find them in the available corpus.
>
> To which Ramesh Krishnamurthy replied:
>> How can extremely rare events be evaluated as 'crucial'?
>> Crucial for what or to whom?
>
> To which Mike replied
> Crucial for theoretical linguistics.  If they are rare but  
> reproduceable
> (in the sense that a number of native speakers can agree on their
> acceptability), then they potentially show something about how we  
> acquire
> language.  The argument is that if they're rare, then we're  
> unlikely to
> have learned them (unlike the case with irregular verbs, say, which  
> are
> largely learned by rote--and this argument has to be justified more  
> than I
> can do here).  So if we all (or some significant population) agree  
> that
> they're acceptable, but we couldn't have learned them, then they must
> somehow be a result of what we have learned, and that may throw
> interesting light on how and what we learn when we learn language.   
> (The
> paradigm case of this, of course, is that of parasitic gaps.)

There are many aspects of this discussion that have been interesting,  
this is one I'd like to comment on briefly.

In many sciences, apparently rare cases turn out to be incredibly  
important. Very few objects in the natural world turn out to be  
magnetic, but once we discover that some are, and try to understand  
what causes such weird behaviour, we gradually discover that what we  
first glimpsed in a few rare iron, nickel and cobalt rocks is key to  
understanding the way the universe works. The rare inconsistencies in  
the orbit of Mercury was the first pointer to general relativity,  
black body radiation was the first pointer to quantum theory. (This  
is a fairly fun game to play, especially with hindsight, though it  
can go on for a long time!)

One problem is that it's never clear to begin with whether a rarity  
is a pointer to something important, or "just one of those things".  
Physics and mechanics are perhaps simple and well-understood enough  
that any exception should draw our immediate attention. But in  
sciences that describe more complex systems (biology, linguistics,  
economics, can I mention history as a science these days?), focussing  
immediately on the exotic can lead you to miss the wood for the trees.

Sooner or later one gets to basic philosophical differences. The  
rationalist will argue that the purpose of science in these areas is  
to try and discover their basic principles so that they become as  
well understood as physics and mechanics. The empiricist will argue  
that the world is truly complex, that the world as a "system" has a  
huge description length, and whatever basic principles you think  
you've discovered, you need to supplement this with a great array of  
contingent facts for these principles to be usefully employed.

Many of us on this list are really engineers as well as scientists,  
and that gives us an extra motivation for trying to cover the more  
prevalent phenomena rather than the rarities. If you're a metalworker  
building a bridge, you may find magnetism really interesting, and you  
may think that gravity is rather dull by comparison, but your job as  
an engineer is to keep the people crossing the bridge from falling  
into the ravine below!

Going back to the title of this thread, since both computational  
linguistics and Chomsky's approach to theoretical linguistics are  
rather new, we are still feeling our way towards the right balance of  
scientific methods - every science seems to rediscover the tension  
between Plato and Aristotle afresh. Unfortunately this process seems  
to fraught with dismissal of other approaches, put-downs and name- 
calling ("armchair", "shallow", "statistical" as a criticism, etc.),  
but I think that many of us are looking for insight from all sides  
here and are trying to feel our way towards the right balance for the  
science of language, wherever it may be.

Yours,
Dominic



More information about the Corpora mailing list