[Corpora-List] Looking for linguistic principles

Stefan Bordag sbordag at informatik.uni-leipzig.de
Sat Oct 15 08:59:13 UTC 2005


Dear Rob,

> > Perhaps it was precisely the lack of these materials [large corpora,
> > availability of machines] which made the structuralist programme
> > infeasible during the 1950s, rather than some fundamental theoretical
> > flaw.
>
> "Perhaps", but what was the "theoretical flaw"?

As I wrote, the criticism was probably not even directed against the 
'distributional' or structuralist programme at all, but instead at the 
behaviorism. But then again, I unfortunately have not read enough about 
phonology to make literate comments on that part of the debate that went 
on there. I am still digesting John Goldsmiths answer. ;-)

> > And I might add a little further up in the same section of Finchs 
> > dissertation:
> >
> > This [structuralist] paradigm was criticised by Chomsky (57) for
> > failing to properly dissociate the definition of what structure 
> > existed in natural language from the procedures which allowed that 
> > structure to be found
>
> This may be it, though I'm not clear exactly what Finch means by 
"failing
> to ... dissociate the ... structure ... from the procedures..." Does he 
mean
> Chomsky observed different procedures resulted in different structures?

What Finch means is actually pretty clear. The problem Chomsky seemed to 
criticise can be summarized in the following exaggerated example. I want 
to write a program that is supposed to find word classes automatically. I 
have then several possibilities. One is that I simply put into the 
program all correct assignments of words to word classes. In that case 
the program will operate flawlessly, of course. And yet it will not 
discover any kind of structure because I have put al the structure into 
it already.
The other possibility is that I write a clustering mechanism that makes use of 
comparisons of words based on their contexts (the pure distributional 
method). If the program then comes up with several word classes and all 
words assigned to the different word classes, then the program has found 
the structure, not me. And I guess that the potential of clustering and 
this contrastive method of comparisons (which are really independent of 
the language level used) is what Chomsky didn't understand, although this 
sounds almost unlikely. But then again, clustering is no fun (=too much 
work, see also the citation of Martinet I offered in my previous email) if 
there are no computers to do it.

Then again, Chomsky might also have meant that while it might be possible 
to find the different word classes - that still doesn't help to find 
rules! But as the first automatic grammar induction experiments show this 
is also not really an issue. Simply put, it is enough to allow the system 
to possibly find, say, context free rules in order to compress a loss-free 
representation of the language in question, then it might find them still 
just using context comparisons, i.e. the distributional method. It's all 
about the *kind* of structure that we assume to be in the language. We 
assume that there are classes of elements, so we design an algorithm that 
finds all possible useful or meaningful classes (free morphemes vs. bound 
morphemes, nouns vs. verbs, etc) and assignments to these classes.
We assume there are rules, then we design an algorithm that finds rules 
(on the morpheme level, on the sentence level, etc). But as soon as we 
give hints to the system such as how many word classes to find, then we 
are actually putting structure into the system which it was supposed to 
find.

Otherwise, as I said, I cannot comment too much on the phonological 
debate, so I would just refer to the answer of John Goldsmith.

By the way, Diana Santos has suggested the book Empirical linguistics
Educating Eve (now new edition called "The language instinct debate")
by Geoffrey Sampson which is highly relevant to this discussion.

Best regards,
Stefan Bordag

-- 
---------------------------------------------------------------------
- Bordag Stefan, sbordag at informatik.uni-leipzig.de                  -
- Institut fuer Informatik, Abt. Automatische Sprachverarbeitung    -
- Universitaet Leipzig                                              -
---------------------------------------------------------------------



More information about the Corpora mailing list