[Corpora-List] Bootcamp: 'Quantitative Corpus Linguistics withR'--re Louw's endorsement
Wolfgang Teubert
w.teubert at bham.ac.uk
Fri Aug 29 10:14:26 UTC 2008
I am grateful to Yorick for letting us know about his philosophy of language. Indeed I scanned the second, third and fourth chapter of the book he recommended (these are the ones, I was told, which are most certainly written by himself), and I learned a lot, as always when I listen to Yorick. He is such a great, big and impressive man, and I admire him very much for his outspokenness and his sense of abrasive humour.
But we mustn't forget that we all carry our agenda around. Yorick's is firmly transatlantic, and I believe he has been involved in a number of projects funded by the NSF. I would be surprised, however, to hear that he has also been involved in projects funded by the NEH.
Over a hundred years ago, Ferdinand de Saussure elevated linguistics from the mediocrity of les sciences humaines to the status of a 'hard science'. Sciences are about facts, and about systematic behaviour. So there has to be a language system, consisting (for Chomsky but not for Saussure) of eternal, universal and hard-wired laws, like the law of the second law of thermodynamics, laws, that is, that simple no human ingenuity can simply violate. Chomsky used to call them rules. So far, the only of these laws that make any sense to me is that a spoken utterance must have a beginning and an end - the law of linearity. So far, though, no language mechanism has ever been discovered, even though uncountable models have been invented. There also seems to be, particularly when it comes to meaning, a certain scarcity of 'brute facts'. No one has ever been able to give us a credible representation of a mental concept.
Cognitive linguistics, and particularly its illegitimate offspring, NLP, however, seemed ideal to uphold the claim to scientificity. Indeed the starting point of the cognitive sciences was the computational theory of the mind. There was the blueprint for the mind as a hard-wired but programmable mechanism. It only needed to be filled, and there was the heritage of structuralism, the sememes and the semes Greimas and Pottier had invented and an lot of structuralist work done on the old continent. Not shortly thereafter, the cognitivists sold their blueprint back to the computer scientists who had now become interested in NLP, and in MT and AI. Sememes became reified and were called concepts, while semes morphed into semantic primes. Suddenly they were everywhere, in the cognitive literature, in the NLP literature, in transatlantic introductions to the philosophy of language. Here they were merged with the realists' obsession for truth, metaphysical or otherwise, the heritage of the Vienna circle and the staple fare of analytic philosophy. But did it work? One only has to read Alan Melby's devastating account of the early history of MT and why it had to fail. That didn't keep interested parties from funding Eurotra, this perfect translation platform for EU documents, or the ultimate solution to all AI problems, CYC, the first conceptual ontology endowed with common sense reasoning, or Verbmobil, the German attaché case translating spoken German into spoken Japanese and vice versa. All great success stories.
Now the solution is statistics. Statistical models have the advantage that one can always fine-tune them a bit more, and keep the cashflow from drying up. I am not saying it's senseless. Progress is being made. Speech recognition now can distinguish 'yes' from 'no'. Systran is getting better all the time by attacking errors one by one. Google is working nicely. Summarisation is a great success. This is Yorick's world, and thanks to people like him who have a sense of history and look beyond their limited tasks, things are getting better. It is people like him who are needed there. For he knows that in the end language cannot be reduced to a mechanism, a computer model that would always come up with nice results regardless of how people actually use language and whether they agree with these results. Their fault!, shout the language engineers. They have to change the way they speak. Here is our controlled language. Use it and MT works just perfectly.
The goal of NLP, of MT and of AI is to make computers process language and thus to develop applications to make our lives easier. That's fine, and that has to go on. After the demise of the original model, the people working in these fields have found that they need real language data, and a lot of them, to develop better models and turn them into marketable applications. Thus there is a huge demand for speech corpora, for parallel corpora and for many other kinds of domain-specific corpora. But this does not turn these people in to corpus linguists. Corpus linguistics, as I see it, is a branch of linguistics, not of the NLP sciences.
Linguistics is the study of real, human language, not the development of useful gadgets simulating the use of language. For me, it therefore belongs to a large extent to the human sciences. The key difference is, for me, that the human sciences are not concerned with 'brute facts' but with interpreting the discourse, everything that has been said and is continually being said. This holds for art history, for anthropology, for theology, for certain branches of social and cultural studies, and, of course, for linguistics as far as it is concerned with meaning. Meaning is not a computational phenomenon. It may be a mental phenomenon, but then we don't have access to the mind. It certainly is a social phenomenon.
Linguists, though, while they may be experts for many things such as phonology, syntax rules, language history, dialectology, are not specialists for meaning. The interpretation of a text, a text segment, a phrase or a word in its context is a task for the whole 'interpretive community' (Stanley Fish), and the linguist is welcome to participate in it. But they do not have any privileged knowledge concerning meaning. The meaning of signs is always the (provisional, never final) result of a never-ending negotiation between the sign users.
The transatlantic philosophy of language is not as limited as the NLP-focused accounts of it make us believe. There are the contributions of the pragmatists, Peirce, Dewey, James, of the behaviourists including Mead and Blumer. There is also social constructionism, there is social epistemology, there is Feyerabend and Watzlawick, there is Rorty and the 'internal realism' of Hilary Putnam. Their texts are rarely referenced in NSF applications.
Philosophy of language also takes place outside the Anglophone world. There is Russian formalism (Bakhtin/Volosinov), there is hermeneutics (Gadamer and Ricoeur), there is post-structuralism, to name but a few. Of course, there is no "tutorial" that tells us all there is to know about these deviations from orthodoxy. We have to deal with them as autodidacts, reading their texts without being told what they mean.
I won't be around for the next week or so.
Wolfgang
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list