[Corpora-List] Chomsky and computational linguistics

Rob Freeman lists at chaoticlanguage.com
Tue Jul 31 03:33:21 UTC 2007


On 7/27/07, John F. Sowa <sowa at bestweb.net> wrote:
>
> Rob,
>
> The theme I was trying to get across was the summary line,
> which I quoted from C. S. Peirce:
>
>     "Do not block the way of inquiry."


Glad to hear you are open to new ideas John.

> Chomsky saw that structural descriptions of language based
> > on corpora failed.
>
> Chomsky saw no such thing.  Have you read the book by C. C. Fries?
> He used a corpus that was tiny by today's standards, but Fries was
> able to develop a decent structural description of English from it.
>
> And there was a long history of people who derived good grammars
> for dead languages from corpora -- including C's former teacher
> Zellig Harris, who derived a grammar of Phoenician.  In fact, Harris
> postulated the idea of transformations as a result of his work on
> corpora.


Yes such thing.

Or course. Structural linguistics was wildly successful. Everything seemed
tidy. There were just the details to clear up. Harris was extending it when
Chomsky was his student. But Chomsky saw a problem.

Broadly put, I think you can summarize the problem as being that you find
too many grammars (too much success?), and these multiple grammars
contradict each other.

As I've posted here before, this problem manifested itself principally in
phonemics.

We may have forgotten about it, but at the time people saw the issue clearly
enough for American Structuralism to disappear as a school. What was left
were variants of Chomsky's school of innate grammar, either purely syntactic
or later semantic, and those who stuck to the methods of functional contrast
but abandoned the goal of structure.

> There is today, still, no (major?) branch of theoretical
> > linguistics which teaches a basis in observable structure.
>
> You seem to be using the term "observable structure" as implying
> that induction alone (also known as "data mining") is the only way
> to analyze data.  But Peirce noted that there are three fundamental
> methods of logic:  deduction, induction, and abduction.
>
> Deduction cannot derive anything that wasn't implicit in the starting
> assumptions.  Induction derives new hypothesis by a systematic search
> for hidden patterns.  Abduction pulls a hunch, a wild guess, or a
> brilliant insight out of thin air -- which must then be tested by
> deduction and abduction.


Which modern branch of theoretical linguistics teaches a basis in
"observable structure" (of any kind)?

Modern corpus linguistics is not an exception. To the extent corpus
linguistics exists as a branch of theory, it does not teach that corpora are
a good way to learn grammar, innate or otherwise. It teaches that corpora
are irreducibly complex, and if we want to know anything about language we
must refer directly to the corpus.

Otherwise, why would we need the corpus? Just get a good grammar.

Are we forgetting this too?

> I think Chomsky's detailed conclusion that the structure of
> > language was not observable because it is innate, was wrong.
> > But I think he was right that a regular structure in language
> > can't be observed.
>
> I agree that in linguistics, as in most sciences, induction alone is
> not sufficient to derive deep insights.  It is a useful, but weak
> method, which must be supplemented by something else.  Harris's
> hypothesis of transformations was a good example of an abduction.
>
> Another excellent abduction was the hypothesis that the coptic
> language was a later stage of the language spoken by the pharaohs.
> That hypothesis was key to decoding the phonetic markers in the
> hieroglyphics that were difficult to decipher from the corpus alone.
>
> Chomsky's suggestion of using a native speaker's intuition is
> another example of using insight as a source of abductions.
> That was a fine idea.  But those insights are best used as
> a *supplement* to a corpus, not as a *replacement* for it.


I'm not sure if I've ever seen Chomsky's Universal Grammar characterized as
a distinction between abductive and inductive reasoning before. You're
saying Chomsky really thought we could derive grammar from observations of
language, just we had to do it abductively?

> The answer is not to be had by moving to semantics, or
> > revalidating statistics again. These don't get to the core
> > of the problem.
>
> I wasn't recommending either of them as the sole answer.  I was
> criticizing Chomsky for ruling out other ideas, and I would not
> recommend replacing his dogma by another dogma based on semantics
> or statistics.  Language is a very large subject, and no single
> methodology is likely to be sufficient to explore all of it.


Well, he took a position, and I kind of respect that.

Perhaps today we are missing out by not taking a position.

What is the position of people here on the issue of structural abstractions
from data? Do we believe a corpus is necessary to get all the fine details
of language usage, or are abstractions enough?

> The core of the problem I think is indeed to be found in what
> > Ken Litkowski describes as "some Godelian experiences that are
> > not covered" (July 3). There is nothing mysterious in this.
> > It is widely known that some systems, notably that arch "formal
> > system" maths itself, are formally incomplete in this sense.
>
> Goedel's work has been misquoted and misapplied to everything
> that anybody finds difficult to understand.  It's the atheist's
> equivalent of the God hypothesis.


Please explain it to us.

-Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070731/9a82f8ae/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list