[Corpora-List] Looking for linguistic principles

Fri Oct 14 08:44:14 UTC 2005

Dear Rob,

First I would like to provide several relevant citations. One from 
Martinet (it's a free translation to English by me, since I have only the 
German version of his introduction to linguistics currently - it would be 
nice if someone could provide me with the proper English translation):

Some linguists have stated the ideal to produce a description method [of 
language] that excludes the meaning of meaningful [language] units. [...] 
It would be possible to arrive at a complete description of the language 
and it would be possible to compile a grammar and a lexicon that would 
lack only the definitions [of the words] in the way they are present in 
current lexicons. In reality no linguist had yet the idea to analyse and 
describe a language he does not know at all in such a way. Such an 
undertaking would by all accounts require an expense of time and energy 
that has deterred even those who consider this approach as the only 
theoretically acceptale one. [...] (free translation by the author of this 
work) \cite{Martinet:69}

The second citation is from Finchs dissertation from the year 1993:

Perhaps it was precisely the lack of these materials [large corpora, 
availability of machines] which made the structuralist programme 
infeasible during the 1950s, rather than some fundamental theoretical 
flaw.

And I might add a little further up in the same section of Finchs 
dissertation:

This [structuralist] paradigm was criticised by Chomsky (57) for failing 
to properly dissociate the definition of what structure existed in natural 
language from the procedures which allowed that structure to be found, and 
of being too ambitious in any case, there not being enough information in 
a corpus of a natural language to define its structure.

However, as another set of important citations from Roy Harris' book 
'Saussure and his interpreters' (01) shows:

There seems to be no indication that Noam Chomsky, founder of modern 
generative linguistics, had ever read or paid attention to the work of 
Saussure until the appearance, in 1959, of the first English translation 
of the Course de Linguistique Generale (Baskin 59).

It appears that Chomsky was critizising something else (since this makes 
it after his 57 publication.):
Also from Roy Harris book:

Having recently published a swingeing attack on behaviorism in linguistics 
(Chomsky 1959), Chomsky was looking retrospectively for pre-Bloomfieldian 
champions of 'mentalism' who could be posthumously resurrected as avatars 
heralding his own approach to linguistic theory. The welcome was only a 
cautious one, however, because the Course de linguistique did not prima 
facie look at all like a generativist treatise in embryo.

So it looks that he was attacking behaviorism (which indeed cannot provide 
generalizations), not structuralism (and therefore the distributional 
methods which are based on structuralism) which he at that time did not 
really know and later simply failed to understand (or to acknowledge) 
properly because he was trying to narrow language to a purely generativist 
point of view (with which he unfortunately nevertheless mostly succeeded, 
it appears) and to make prominent his distinction between performance and 
competence.

On my own account I would agree that at least to me it seems that the 
criticism by Chomsky has indeed 'killed' most efforts on researching 
distributional methods. But that was also perhaps a good thing, because 
back at that time the two essential resources as named by Finch simply 
were not available and the research in pure grammar (which is not 
necessarily dependend on these resources) has also brought many insights 
since.

However, observing the increasing amount of quite varied work in that 
field including not only Finch, but also Burghard Riegers, Andrea Lehrs, 
(not to forget Andre Martinet), Reinhard Rapps, also such applied works 
like Goldsmiths and many others I would not say that this objection was 
never really adressed. It is simply the fact that the field of linguistics 
has grown so large, that most 'true' linguists in the sense of 
'generativists' (but also formal semantics as an offspring of 
generativism), have not yet become fully aware of the significance of the 
reappearance of the distributional method and its new potential given the 
availability of the two essential resources. The disadvantage to that 
currently is, that sometimes this kind of research is still considered as 
inferior to true generativist research.

Of course this is also due to the fact that currently only very basic 
notions such as the distinction between word classes can be extracted 
fully automatic from raw text whereas generativists are currently occupied 
with a lot more intricate questions as titles of talks of recent 
conferences show. This might indicate that indeed, generality is not much 
possible with such methods. But I would also say that the field is 
approaching the moment where 'generality', e.g. grammars of a given 
language can be extracted in a fully automatic way from raw text, without 
any introspection. First works (e.g. Henrichsen: "GraSp: Grammar learning 
from unlabelled corpora") are already appearing and represent first 
cautious experiments and I think that several years from now the results 
will already suffice to be of practical use. Of course, the traditional 
objection that manually created grammars are always better will remain for 
a very long time afterwards and only counter-weighted by the simple cost 
factor. ;-)

Best regards,
Stefan Bordag

-- 
---------------------------------------------------------------------
- Bordag Stefan, sbordag at informatik.uni-leipzig.de                  -
- Institut fuer Informatik, Abt. Automatische Sprachverarbeitung    -
- Universitaet Leipzig                                              -
---------------------------------------------------------------------