[Corpora-List] Looking for linguistic principles
Stefan Bordag
sbordag at informatik.uni-leipzig.de
Sat Oct 15 08:59:13 UTC 2005
Dear Rob,
> > Perhaps it was precisely the lack of these materials [large corpora,
> > availability of machines] which made the structuralist programme
> > infeasible during the 1950s, rather than some fundamental theoretical
> > flaw.
>
> "Perhaps", but what was the "theoretical flaw"?
As I wrote, the criticism was probably not even directed against the
'distributional' or structuralist programme at all, but instead at the
behaviorism. But then again, I unfortunately have not read enough about
phonology to make literate comments on that part of the debate that went
on there. I am still digesting John Goldsmiths answer. ;-)
> > And I might add a little further up in the same section of Finchs
> > dissertation:
> >
> > This [structuralist] paradigm was criticised by Chomsky (57) for
> > failing to properly dissociate the definition of what structure
> > existed in natural language from the procedures which allowed that
> > structure to be found
>
> This may be it, though I'm not clear exactly what Finch means by
"failing
> to ... dissociate the ... structure ... from the procedures..." Does he
mean
> Chomsky observed different procedures resulted in different structures?
What Finch means is actually pretty clear. The problem Chomsky seemed to
criticise can be summarized in the following exaggerated example. I want
to write a program that is supposed to find word classes automatically. I
have then several possibilities. One is that I simply put into the
program all correct assignments of words to word classes. In that case
the program will operate flawlessly, of course. And yet it will not
discover any kind of structure because I have put al the structure into
it already.
The other possibility is that I write a clustering mechanism that makes use of
comparisons of words based on their contexts (the pure distributional
method). If the program then comes up with several word classes and all
words assigned to the different word classes, then the program has found
the structure, not me. And I guess that the potential of clustering and
this contrastive method of comparisons (which are really independent of
the language level used) is what Chomsky didn't understand, although this
sounds almost unlikely. But then again, clustering is no fun (=too much
work, see also the citation of Martinet I offered in my previous email) if
there are no computers to do it.
Then again, Chomsky might also have meant that while it might be possible
to find the different word classes - that still doesn't help to find
rules! But as the first automatic grammar induction experiments show this
is also not really an issue. Simply put, it is enough to allow the system
to possibly find, say, context free rules in order to compress a loss-free
representation of the language in question, then it might find them still
just using context comparisons, i.e. the distributional method. It's all
about the *kind* of structure that we assume to be in the language. We
assume that there are classes of elements, so we design an algorithm that
finds all possible useful or meaningful classes (free morphemes vs. bound
morphemes, nouns vs. verbs, etc) and assignments to these classes.
We assume there are rules, then we design an algorithm that finds rules
(on the morpheme level, on the sentence level, etc). But as soon as we
give hints to the system such as how many word classes to find, then we
are actually putting structure into the system which it was supposed to
find.
Otherwise, as I said, I cannot comment too much on the phonological
debate, so I would just refer to the answer of John Goldsmith.
By the way, Diana Santos has suggested the book Empirical linguistics
Educating Eve (now new edition called "The language instinct debate")
by Geoffrey Sampson which is highly relevant to this discussion.
Best regards,
Stefan Bordag
--
---------------------------------------------------------------------
- Bordag Stefan, sbordag at informatik.uni-leipzig.de -
- Institut fuer Informatik, Abt. Automatische Sprachverarbeitung -
- Universitaet Leipzig -
---------------------------------------------------------------------
More information about the Corpora
mailing list