[Corpora-List] Looking for linguistic principles
Stefan Bordag
sbordag at informatik.uni-leipzig.de
Fri Oct 14 08:44:14 UTC 2005
Dear Rob,
First I would like to provide several relevant citations. One from
Martinet (it's a free translation to English by me, since I have only the
German version of his introduction to linguistics currently - it would be
nice if someone could provide me with the proper English translation):
Some linguists have stated the ideal to produce a description method [of
language] that excludes the meaning of meaningful [language] units. [...]
It would be possible to arrive at a complete description of the language
and it would be possible to compile a grammar and a lexicon that would
lack only the definitions [of the words] in the way they are present in
current lexicons. In reality no linguist had yet the idea to analyse and
describe a language he does not know at all in such a way. Such an
undertaking would by all accounts require an expense of time and energy
that has deterred even those who consider this approach as the only
theoretically acceptale one. [...] (free translation by the author of this
work) \cite{Martinet:69}
The second citation is from Finchs dissertation from the year 1993:
Perhaps it was precisely the lack of these materials [large corpora,
availability of machines] which made the structuralist programme
infeasible during the 1950s, rather than some fundamental theoretical
flaw.
And I might add a little further up in the same section of Finchs
dissertation:
This [structuralist] paradigm was criticised by Chomsky (57) for failing
to properly dissociate the definition of what structure existed in natural
language from the procedures which allowed that structure to be found, and
of being too ambitious in any case, there not being enough information in
a corpus of a natural language to define its structure.
However, as another set of important citations from Roy Harris' book
'Saussure and his interpreters' (01) shows:
There seems to be no indication that Noam Chomsky, founder of modern
generative linguistics, had ever read or paid attention to the work of
Saussure until the appearance, in 1959, of the first English translation
of the Course de Linguistique Generale (Baskin 59).
It appears that Chomsky was critizising something else (since this makes
it after his 57 publication.):
Also from Roy Harris book:
Having recently published a swingeing attack on behaviorism in linguistics
(Chomsky 1959), Chomsky was looking retrospectively for pre-Bloomfieldian
champions of 'mentalism' who could be posthumously resurrected as avatars
heralding his own approach to linguistic theory. The welcome was only a
cautious one, however, because the Course de linguistique did not prima
facie look at all like a generativist treatise in embryo.
So it looks that he was attacking behaviorism (which indeed cannot provide
generalizations), not structuralism (and therefore the distributional
methods which are based on structuralism) which he at that time did not
really know and later simply failed to understand (or to acknowledge)
properly because he was trying to narrow language to a purely generativist
point of view (with which he unfortunately nevertheless mostly succeeded,
it appears) and to make prominent his distinction between performance and
competence.
On my own account I would agree that at least to me it seems that the
criticism by Chomsky has indeed 'killed' most efforts on researching
distributional methods. But that was also perhaps a good thing, because
back at that time the two essential resources as named by Finch simply
were not available and the research in pure grammar (which is not
necessarily dependend on these resources) has also brought many insights
since.
However, observing the increasing amount of quite varied work in that
field including not only Finch, but also Burghard Riegers, Andrea Lehrs,
(not to forget Andre Martinet), Reinhard Rapps, also such applied works
like Goldsmiths and many others I would not say that this objection was
never really adressed. It is simply the fact that the field of linguistics
has grown so large, that most 'true' linguists in the sense of
'generativists' (but also formal semantics as an offspring of
generativism), have not yet become fully aware of the significance of the
reappearance of the distributional method and its new potential given the
availability of the two essential resources. The disadvantage to that
currently is, that sometimes this kind of research is still considered as
inferior to true generativist research.
Of course this is also due to the fact that currently only very basic
notions such as the distinction between word classes can be extracted
fully automatic from raw text whereas generativists are currently occupied
with a lot more intricate questions as titles of talks of recent
conferences show. This might indicate that indeed, generality is not much
possible with such methods. But I would also say that the field is
approaching the moment where 'generality', e.g. grammars of a given
language can be extracted in a fully automatic way from raw text, without
any introspection. First works (e.g. Henrichsen: "GraSp: Grammar learning
from unlabelled corpora") are already appearing and represent first
cautious experiments and I think that several years from now the results
will already suffice to be of practical use. Of course, the traditional
objection that manually created grammars are always better will remain for
a very long time afterwards and only counter-weighted by the simple cost
factor. ;-)
Best regards,
Stefan Bordag
--
---------------------------------------------------------------------
- Bordag Stefan, sbordag at informatik.uni-leipzig.de -
- Institut fuer Informatik, Abt. Automatische Sprachverarbeitung -
- Universitaet Leipzig -
---------------------------------------------------------------------
More information about the Corpora
mailing list