[Corpora-List] Chomsky and computationnel linguistics
Dominic Widdows
widdows at maya.com
Wed Jul 11 18:13:12 UTC 2007
> Mike Maxwell wrote:
>>> And crucial constructions may simply be so rare that you just can't
>>> find them in the available corpus.
>
> To which Ramesh Krishnamurthy replied:
>> How can extremely rare events be evaluated as 'crucial'?
>> Crucial for what or to whom?
>
> To which Mike replied
> Crucial for theoretical linguistics. If they are rare but
> reproduceable
> (in the sense that a number of native speakers can agree on their
> acceptability), then they potentially show something about how we
> acquire
> language. The argument is that if they're rare, then we're
> unlikely to
> have learned them (unlike the case with irregular verbs, say, which
> are
> largely learned by rote--and this argument has to be justified more
> than I
> can do here). So if we all (or some significant population) agree
> that
> they're acceptable, but we couldn't have learned them, then they must
> somehow be a result of what we have learned, and that may throw
> interesting light on how and what we learn when we learn language.
> (The
> paradigm case of this, of course, is that of parasitic gaps.)
There are many aspects of this discussion that have been interesting,
this is one I'd like to comment on briefly.
In many sciences, apparently rare cases turn out to be incredibly
important. Very few objects in the natural world turn out to be
magnetic, but once we discover that some are, and try to understand
what causes such weird behaviour, we gradually discover that what we
first glimpsed in a few rare iron, nickel and cobalt rocks is key to
understanding the way the universe works. The rare inconsistencies in
the orbit of Mercury was the first pointer to general relativity,
black body radiation was the first pointer to quantum theory. (This
is a fairly fun game to play, especially with hindsight, though it
can go on for a long time!)
One problem is that it's never clear to begin with whether a rarity
is a pointer to something important, or "just one of those things".
Physics and mechanics are perhaps simple and well-understood enough
that any exception should draw our immediate attention. But in
sciences that describe more complex systems (biology, linguistics,
economics, can I mention history as a science these days?), focussing
immediately on the exotic can lead you to miss the wood for the trees.
Sooner or later one gets to basic philosophical differences. The
rationalist will argue that the purpose of science in these areas is
to try and discover their basic principles so that they become as
well understood as physics and mechanics. The empiricist will argue
that the world is truly complex, that the world as a "system" has a
huge description length, and whatever basic principles you think
you've discovered, you need to supplement this with a great array of
contingent facts for these principles to be usefully employed.
Many of us on this list are really engineers as well as scientists,
and that gives us an extra motivation for trying to cover the more
prevalent phenomena rather than the rarities. If you're a metalworker
building a bridge, you may find magnetism really interesting, and you
may think that gravity is rather dull by comparison, but your job as
an engineer is to keep the people crossing the bridge from falling
into the ravine below!
Going back to the title of this thread, since both computational
linguistics and Chomsky's approach to theoretical linguistics are
rather new, we are still feeling our way towards the right balance of
scientific methods - every science seems to rediscover the tension
between Plato and Aristotle afresh. Unfortunately this process seems
to fraught with dismissal of other approaches, put-downs and name-
calling ("armchair", "shallow", "statistical" as a criticism, etc.),
but I think that many of us are looking for insight from all sides
here and are trying to feel our way towards the right balance for the
science of language, wherever it may be.
Yours,
Dominic
More information about the Corpora
mailing list